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ABSTRACT 


REAL  ENGLISH:  A  TRANSLATOR  TO  ENABLE 
NATURAL  LANGUAGE  MAN-MACHINE  CONVERSATION 
Author  -  Harvey  Cautin 
Supervisor  -  Morris  Rubinoff 

This  dissertation  presents  a  pragmatic  interpreter/translator 
called  Real  English  to  serve  as  a  natural  language  man-machine 
communication  interface  in  a  multi -mode  on-line  information 
retrieval  system.  This  multi-mode  feature  affords  the  user  a 
library-like  searching  tool  by  giving  him  access  to  a  dictionary, 
lexicon,  thesaurus,  synonym  table,  and  classification  tables 
expressing  binary  relations  as  veil  as  the  document  file  repre¬ 
senting  the  field  of  discourse.  The  user  is  thereby  allowed  a 
greater  freedom  in  search  strategy. 

Real  English  will  l)  syntactically  analyze  the  user's 
message  by  means  of  a  string  analysis  grammar  to  produce  a  tree 
representing  the  interrelationships  of  the  grammatical  entities 
comprising  the  message,  2)  use  this  tree  together  with  a  pragmatic 
grammar  to  establish  the  set  of  commands  necessary  to  fulfill  the 
request,  and  3)  form  the  proper  syntax  for  each  command.  The 
strong  linguistic  foundation  of  the  syntax  analyzer  endows  the 
system  with  the  power  of  flexibility.  As  experience  shows  that 
certain  new  structures  occur  and  should  therefore  be  a  part  of 
the  system,  they  may  be  incorporated  into  the  system  by  the 
grammarian  without  a  major  overhaul  of  the  procedures  to  date. 


The  user  is  permitted  to  phrase  his  requests  is  any  con¬ 
venient  form  (i.e.  declarative,  imperative,  interrogative,  or 
fragmented  sentence  referred  to  as  conversationally  dependent 
sentences).  Thus  instead  of  placing  the  user  in  the  difficult 
position  of  learning  a  new  language,  the  system  is  given  the 
responsibility  of  responding  in  and  to  the  user's  language,  i.e. 
the  man-machine  conversation  is  carried  out  in  a  natural  language. 
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PREFACE 


The  research  represented  in  this  dissertation  was  carried 
out  in  response  to  a  problem  posed  by  Dr.  Morris  Rubinoff  of 
The  Moore  School  of  Electrical  Engineering  of  the  Ifeiversity  of 
Pennsylvania.  His  long  association  with  the  field  of  information 
retrieval  has  focused  his  attention  on  the  problem  of  man-machine 
coimnunication  in  an  attempt  to  provide  a  library-like  environment 
which  may  be  readily  learned  by  users. 

The  work  presented  by  the  author  is  to  be  incorporated  into 
the  information  retrieval  system  of  the  Moore  School  Information 
Systems  Laboratory.  This  system  is  presently  being  programmed 
for  the  RCA  Spectra  70/46. 
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CHAPTER  1 


INTRODUCTION 

As  nan's  knowledge  continues  to  grow  at  an  increasing  rate, 
it  becomes  ever  more  desirable  that  persons  in  need  of  information 
have  at  their  disposal  a  rapid  and  accurate  method  of  acquiring 
it.  A  computerized  information  retrieval  system  seems  to  offer 
the  best  chance  of  achieving  this  goad.  However,  problems  are 
immediately  encountered.  One  important  problem  is  that  requests 
for  information  from  such  systems  have  to  be  formed  in  a  language 
that  the  system  can  'understand'  but  which  by  and  large  is  quite 
foreign  to  the  user.  Therefore  an  intermediary  is  desirable  to 
translate  the  user's  request  from  his  own  natural  language  into 
the  language  used  by  the  system.  Another  important  problem  is 
that  turnaround  time  is  something  less  than  ideal  in  most  computer 
systems;  also,  the  user  generally  does  not  receive  the  desired 
information  either  because  the  system  loses  something  in  the 
translation  process  or  the  user  himself  does  not  have  a  good  idea 
of  what  he  wants  nor  means  for  clarifying  his  ideas.  Furthermore, 
the  system  serves  only  one  mode  of  operation,  namely,  a  document 
file  search,  thereby  limiting  the  user's  search  strategy. 

What  the  searcher  needs  is  a  library  facility  built  into 
the  retrieval  system  in  which  he  can  converse  with  the  system 
as  he  would  with  a  librarian  and  can  find  out  how  the  library  is 
organized,  what  information  can  be  found,  how  it  can  be  found, 
and  what  to  do  when  in  trouble  with  search  procedures.  Instead 

-  1  - 


2. 


of  putting  the  user  into  the  difficult  position  of  learning  a  new 
language,  the  system  is  given  the  task  of  responding  in  and  to  the 
user's  language,  i.e.  the  man-machine  conversation  is  carried  out 
in  a  natural  language. 

To  accommodate  the  different  search  strategies  of  many  users, 
the  conventional  library  generally  offers  not  only  the  main  body 
of  documents  but  also  a  thesaurus,  dictionary,  various  indexes, 
and/or  citation  references.  As  a  step  towards  this  goal  in 
computerized  information  systems,  this  dissertation  presents  a 
pragmatic  translator  to  enable  truly  natural  language  man-machinc 
multi-mode*  conversation  in  an  on-line  information  retrieval  system 
through  a  remote  teletypewriter  console.  The  techniques  discussed 
in  this  pragmatic  translator  are  collected  together  under  the  label 
'Real  English'  and  can  be  incorporated  into  various  types  of  infor¬ 
mation  systems.  To  illustrate  these  techniques,  Real  English  has 
been  designed  to  be  used  in  the  environment  established  by  the 
Moore  School  Information  Retrieval  System.  Basically,  Real  English 
accomplishes:  l)  a  syntactical  analysis  (i.e.  parse)  of  the  user's 
message  using  a  string  analysis  grammar,  2)  use  of  this  parse,  and 
previous  dialogue  if  necessary,  to  determine  what  the  user  wishes, 
and  3)  formation  of  a  series  of  system  commands  to  fulfill  the 
request . 

The  structure  of  a  pragmatic  interpreter/translator  such  as 
Real  English  depends  upon  such  environmental  factors  as":  l)  the 
linguistic  style  of  the  users,  i.e.  the  actual  grammatical 

*  A  mode  is  a  variety  of  conversation  associated  with  a 
particular  data  file. 
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structures  most  likely  to  be  used  as  input  to  the  translator,  2 ) 
the  actual  symbolic  commands  acceptable  to  the  retrieval  system, 
and  3)  the  computer  initiated  and  directed  dialogue  occurring  as 
a  result  of  retrieving  the  desired  information.  The  acceptance 
of  a  syntactical  structure  into  the  grammar  of  a  syntax  analyzer 
of  a  pragmatic  translator  depends  upon  the  preceeding  three 
conditions,  whereas  the  generated  list  of*  commands  which  when 
executed  will  fulfill  the  request  depends  entirely  upon  item  2 
above . 

The  formal  character  of  the  grammar  incorporated  into  the 
syntactical  analysis  algorithm  differs  from  that  of  phrase 
structured  grammars,  ire.  the  structural  description  obtained 
for  a  given  sentence  will  be  different.  It  is  possible  to  go  from 
one  grammar  to  the  other  although  the  mapping  is  by  no  means 
trivial. 

A  well  developed  syntax  analyzer  program  incorporating  a 
significant  part  of  English  grammar  was  available  to  the  author. 
The  structural  description  provided  by  this  grammar  seemed  well 
suited  for  the  constriction  of  the  pragmatic  interpreter  (e.g.  the 
recognition  of  the  information  clauses  which  usually  appear  as 
complete  adjunct  strings  is  accomplished  through  the  occurrence 
of  specific  nodes  on  the  tree).  The  basic  principles  of  construc¬ 
ting  the  pragmatic  interpreter  would  be  equally  applicable  if  the 
structural  description  were  of  the  kind  provided  by  a  phrase 
structured  grammar. 

Several  systems  have  been  developed  over  the  past  decade 
that  attempt  to  retrieve  information  based  upon  natural  language 


*  0>5»7,3,SM1:16] 

requests 

The  SIR  (Semantic  Information  Retriever)  system  demonstrates 

a  conversational  and  deductive  ability  through  the  use  of  a  model 

which  represents  the  semantic  content  from  a  variety  of  subject 

areas^^.  The  data  base  is  highly  structured  involving  binary- 

relations  expressed  as  attribute-value  pairs.  SIR  recognizes  only 

a  small  number  of  sentence  for, ns,  each  of  whicti  corresponds  to  a 

particular  relation.  Like  Lindsay's  SAD  SAM  system,  the  data  base 

for  SIR  is  first  generated  by  input  sentences  and  then  later 

[8] 

queried  through  natural  language  statements 

In  the  DEACON  system,  natural  language  is  treated  as  a  formal 
language  from  which  a  technique  developed  by  Thompson  is  used  to 


determine  the  meaning  of  sentences  in  that  language 


Thompson 's 


hypothesis  is  that  "English  essentially  becomes  a  formal  language 
as  defined  if  its  subject  matter  is  limited  to  material  whose 
interrelationships  are  specifiable  in  a  limited  number  of  pre- 

r  i7n 

viously  structured  categories  (memory  structures)"  J.  Again,  a 

highly  structured  data  base  is  used  (ring  structures  in  this  case) 

for  storing  the  information. 

The  CUE  (Computer  Utilize  English)  system  is  a  document 

retrieval  system  allowing  a  more  varied  sentence  structure  than 

[9] 

the  other  systems  .  It  is  one  of  the  few  systems  that  uses  a 
complete  syntactical  decomposition  as  a  basis  for  its  inter¬ 
pretation.  The  system  by  Lamson  is  predicated  on  the  principle 
"that  syntactic  structure  can  be  replaced  by  a  much  simple-  kind 
of  structure  without  a  great  loss  of  meaning  in  many  technical 
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specialities"  .  Similar  techniques  are  used  in  other  systems 
to  avoid  the  essential  syntactical  analysis  needed  to  achieve  a 
flexible  system. 

1, 1  Background 

As  of  19^6,  the  information  retrieval  system  of  the  Moore 
Cchool  Information  Systems  Laboratory  (MSISL)  performed  its 
man-machine  interaction  by  means  of  a  Symbolic  Command  Language. 

This  language  was  a  formal  one  requiring  the  proper  syntactical 
use  of  left  and  right  parentheses,  logical  symbols  (such  as 
&  -  and,  +  -  or,  I  -  and  not),  and  index  terms. 

Experiments  were  designed  to  determine  the  degree  of  difficulty 
the  uninitiated  user  would  have  in  learning  to  use  this  Symbolic 
Command  Language  oriented  system.  These  experiments  showed  that 
persons  having  little  exposure  to  any  form  of  abstract  algebra, 
logic  or  programming  found  the  system  extremely  difficult  to 
operate.  Based  upon  these  results,  it  was  decided  that  instead 
of  putting  the  user  in  the  difficult  position  of  learning  a  new 
language,  the  system  itself  would  be  given  this  task.  To  test 
the  feasibility  of  such  a  proposal  on  a  small  scale,  Easy  English 
was  developed  as  the  successor  of  the  Symbolic  Command  Lan- 

T  2  13"1 

guage  ’  J.  Easy  English,  a  somewhat  restricted  but  nevertheless 
real,  version  of  English,  allowed  the  user  to  have  his  queries, 
which  were  written  in  English,  translated  into  the  Symbolic 
Command  Language  equivalent  at.  which  point  the  system  executed 


the  request. 
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The  success  attending  subsequent  use  at  the  Moore  School 
provided  the  incentive  to  develop  an  information  retrieval  system 
with  a  more  sophisticated  pragmatic  interpreter  and  translator  to 
serve  as  the  link  between  man  and  machine  in  a  conversation- 
oriented  environment.  This  new  translator,  Real  English,  is 
designed  to  eliminate  many  drawbacks  of  Easy  English,  namely: 

1)  Easy  English  operated  only  in  the  search  mode,  i.e. 
the  user  could  only  query  the  system's  data  files 
which  referred  to  the  documents  stored.  Through  written 
requests  in  English,  Real  English  will  give  the  user 
access  to  definitions  with  semantic  expansion*,  a 
thesaurus,  synonyms,  and  relationships  expressed  through 
classification  tables. 

2)  Easy  English  could  correctly  interpret  only  those 
messages  that  were  written  in  declarative  form; 

Real  English  will  handle  declarative,  interrogative, 
and  imperative. 

3)  The  rather  crude  grammar  used  in  Easy  English  led  to 

a  system  not  flexible  enough  to  accommodate  significant 
changes  in  the  structure  of  the  user's  request. 

1.2  Real  English  System 

As  illustrated  in  Figure  1,  the  Real  English  System  consists 
of  three  distinct  components:  l)  syntax  analyzer,  2 )  pragmatic 


*  See  Sect  .on  2.3.3  for  an  explanation  of  "semantic  expansion". 
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interpreter,  and  3)  command  formatter. 

The  syntax  analyzer  requires  three  inputs:  the  grammar,  the 
word  dictionary,  and  the  input  sentence;  it  produces  a  tree.  The 
grammar  is  composed  of  strings  and  restrictions .  A  grammar  string 
consists  of  definitions  of  the  grammatical  entity  which  the  string 
represents;  a  restriction  consists  of  tests  to  determine  whether 
or  not  a  particular  definition  of  the  string  is  valid  for  the  sen¬ 
tence  being  analyzed.  The  word  dictionary,  insofar  as  the  syntax 
analyzer  is  concerned,  supplies  a  category  list  for  each  word  i., 
the  sentence.  This  list  consists  of  all  the  categories  of  the 
word:  e.g.  noun,  verb,  etc.  The  syntax  analyzer  parses  the 
sentence  by  constructing  a  tree  (from  the  string  definitions) 
whose  terminal  nodes  correspond  to  the  categories  of  the  successive 
words  of  the  sentence. 

The  pragmatic  interpreter  or  command-set  generator  determines 
the  actual  system  commands  to  be  executed  in  order  to  fulfill  the 
request.  In  its  task,  the  pragmatic  interpreter  uses  the  tree 
representation  of  the  sentence  developed  by  the  syntax  analyzer 
together  with  the  pragmatic  codes*  of  the  various  words  of  the 
sentence  from  the  word  dictionary,  and  the  pragmatic  grammar’'  ‘ 

•-  m  t  is  a  series  of  tests  'upon  the  tree  to  determine  the  commands 
nr.’  o'-;  a  ted  with  the  given  sentence. 

’iVagrnatic '  codes  of  a  word  refer  to  those  codes  that  are 

added  to  a  word's  dictionary  record  solely  to  aid  in  interpreting 

the  intention  of  the  user's  message. 

**  'Pragmatic'  refers  to  the  grammar  used  to  interpret  the 
'intended'  meaning  of  the  user’s  message. 
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The  syntax  of  each  generated  command  is  formed  by  the  command 
formatter.  Use  is  again  made  of  the  word  dictionary  which  also 
holds  information  relating  to  conjunctions,  bibliographic  data 
(author,  editor,  etc.),  and  other  considerations  explained  below. 

1.3  Contributions 

For  the  first  time,  a  user  of  an  on-line  retrieval  system 
will  be  able  to  communicate  with  a  multi-mode  system  by  means  of 
a  natural  language  thereby  allowing  greater  freedom  of  search 
strategy.  For  example,  consider  a  user  who  may  have  only  a  vague 
idea  of  his  needs  in  the  form  of  a  list  of  topic  areas  (i.e.  index 
terms).  This  user  may  proceed  along  several  paths  in  his  quest 
for  information: 

1)  he  may  extract  from  a  thesaurus  terms  related  to  his 
index  set  in  an  effort  to  narrow  or  broaden  his 
search  area  and  then  query  the  document  file  with 
this  newly  created  list. 

2)  he  may  immediately  search  the  file  and  from  the 
results  of  this  initial  search  refine  his  index 
set  for  subsequent  searches. 

3)  he  may  initially  search  the  file  using  this  set -of 
index  terms  and' from  the  results  continue  searching 
from  a  different  path  (such  as  by  author  or  publisher). 

U)  choosing  any  of  the  above  paths,  he  may  encounter  terms 
which  are  unfamiliar  to  him.  In  such  a  case,  he  may 
request  clarification  in  the  form  of  a  definition  or 
example. 
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It  is  to  be  noted  that  in  the  above  four  search  strategies, 
different  modes  of  system  operation  are  involved: 

1)  Search  mode:  the  user  actually  queries  the  document 
file  for  relevant  documents  pertaining  to  his  needs. 

2)  Dictionary  mode:  clarification  of  uncommon  terms  by 
definition  or  example. 

3)  Synonym  mode:  synonyms  are  supplied  for  stated  terms. 

4)  Relational  mode:  terms  related  to  a  given  list  of  terms 
are  found.  The  relation  itself  may  be  varied  (e.g. 
whole-part,  generic-specific,  noun-modifier) . 

The  burden  of  determining  which  mode  is  desired  by  the  user 
and  also  the  particular  command  within  that  mode  to  be  used  falls 
upon  the  system  and  is  a  function  of  Real  English.  In  this  way, 
the  system  becomes  a  library  readily  accessible  through  a  remote 
console  typewriter. 

Also,  for  the  first  time,  an  information  retrieval  system 
will  be  able  to  use  previous  dialogue  as  an  aid  in  resolving 
ambiguities.  Consider  the  following  examples. 

Example  1: 

Mr.  A  Mr.  3 


Message  1: 


Message  2: 


Give  me  anything 
written  about  radar 
How  about  sonax? 


What  is  the  meaning 
of  radar? 

How  about  sonar? 


Notice  that  in  both  cases,  the  second  message  is  syntactically 
identical.  However,  due  to  the  dialogue  involved  Mr.  A  will 
receive  information  from  the  documents  stored  in  the  file  dealing 
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with  sonar  whereas  Mr.  B  will  receive  the  definition  of  sonar. 

Example  2: 

Message  1:  I  want  all  the  papers  on  cosmic  radiation 

after  1965. 

Message  2:  Who  wrote  them? 

Once  again,  a  follow-up  request  is  based  on  a  previous 
message  and  as  such  the  user  would  receive  the  authors  of  all 
the  papers  selected  by  the  first  message. 

In  addition,  a  rather  extensive  grammar  has  been  altered  to 
meet  the  special  requirements  of  an  on-line  information  retrieval 
system.  The  environment  existing  with  such  a  system  inherently 
limits  the  grammatical  structures  likely  to  occur.  For  example: 

1)  Subjects  are  likely  to  be  pronouns  (e.g.  I,  You)  or 
index  terms  (e.g.  Jones,  Association  for  Computing 
Machinery) . 

2)  Verbs  are  those  that  occur  in  requests  of  various 
Kinds  (e.g.  like,  want,  give). 

3)  Objects  are  most  likely  to  be  simple  noun  or  pronoun 
phrases  whose  adjuncts  are  either  verbal  phrases, 
prepositional  phrases,  THAT  -  clauses,  or  adjectival 
phrases. 

4)  Strings  representing  index  term  sequences  (e.g.  cosmic 
radiation,  the  theory  of  heat  transfer)  must  be  added 
to  the  grammar. 

5)  Incomplete  sentences  (e.g.  Papers  by  Smith j  How  about 
sonar? ;  Cosmic  radiation)  must  be  made  acceptable  to 
the  grammar . 
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6)  Conjunctions  are  limited  as  to  the  elements  they  may 

Join.  In  addition,  index  terras  are  more  easily  associa¬ 
ted  with  the  proper  sector  designator  code  when  they 
occur  as  part  of  a  common  subtree  Joining  the  same  node 
representing  the  sector  designator. 

The  strong  linguistic  basis  for  the  syntactical  analysis 
endows  the  system  with  the  power  of  flexibility.  As  experience 
shows  that  certain  new  structures  frequently  occur  and  should 
therefore  be  a  part  of  the  system,  they  may  be  incorporated  into 
the  system  without  a  major  overhaul  of  the  existing  procedures. 
Also,  as  more  commands  are  added  to  the  symbolic  commands  of  the 
system,  the  syntax  analyzer  provides  the  keys  to  determining  the 
various  pragmatic  codes  to  be  used. 


S.C.  -  System  Commands  to  be  executed 


CHAPTER  2 


ENVIRONMENTAL  FACTORS 
INFLUENCING  THE  PRAGMATIC  INTERPRETER 

2.1  Introduction 

Inasmuch  as  Real  English  serves  as  a  link  between  the  user  and 
the  system,  it  must  take  into  account  the  behavioral  characteris¬ 
tics  of  each.  To  understand  the  user's  messages,  the  system  must 
have  a  knowledge  of  what  the  user  is  likely  to  say  and  also  how 
the  system  is  to  respond,  i.e.  which  symbolic  commands  are  to  be 
executed. 

2.2  Syntactical  Structure 

In  order  to  get  a  feeling  for  the  syntactical  structures 
likely  to  be  used  by  searchers,  various  people  both  inside  and 
outside  the  University  community  were  asked  to  prepare  written 
requests  for  information  that  they  considered  to  be  normally 
found  in  a  library.  Based  upon  these  requests  (approximately  100) 
and  their  stylistic  variations,  a  grammar  was  developed.  In  order 
to  test  this  initial  grammar,  both  oral  and  written  experiments 
were  performed. 

Both  experiments  were  designed  to  simulate  a  man-machine 
dialogue.  In  the  oral  experiment,  this  dialogue  was  performed 
via  telephone  conversation  whereas  in  the  written  experiment  it 
was  carried  out  through  teletypewriter  communication. 

The  purpose  of  the  experiments  was  to  learn  how  the  user 
phrases  his  requests  and  not  whether  he  actually  retrieves  any 
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information  from  his  requests.  In  order  to  avoid  giving  the 

subject  any  hint  as  to  how  a  request  could  be  phrased  and  thus 

not  bias  his  ovm  natural  approach,  a  bibliography  compilation  was 

used.  Such  a  test  involved  very  little  additional  explanation  on 

[  15 , 16T 

the  part  of  the  experimenter. 

The  requests  obtained  from  the  experiment  were  tested  in  the 
initial  grammar  and  pragmatic  interpreter  and  were  found  to  be 
eighty  percent  acceptable.  Some  sample  requests  from  the  experi¬ 
ments  are  found  in  Appendix  A. 

2.3  System  Commands 

The  commands  of  the  system  may  be  divided  into  groups,  or 
modes,  depending  on  the  particular  data  base(s)  involved  in  their 
execution.  Within  each  mode  of  operation  there  may  be  any  number 
of  commands.  The  present  breakdown  of  commands  follows.  Each 
command's  syntax  consists  of  the  command's  name  followed  by  its 
specification  part. 

2.3.1  Search  Mode 

The  search  mode  makes  use  of  the  document  and  inverted  files. 
The  document  file  contains  the  bibliographic  and  subject  material 
relevant  to  each  document  in  the  system.  The  inverted  file  lists 
for  each  significant  word  (i.e.  index  item)  in  each  of  the 
various  sections  (e.g.  author,  title,  abstract,  etc.)  of  a 
document,  all  the  document  numbers  having  this  particular  index 
item  in  the  given  section  of  the  document.  The  inverted  file  is 
used  to  form  a  list  of  accession  numbers, satisfying  a  criterion 
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based  upon  index  terms.  (An  index  term  is  a  sequence  of  one  or 
more  index  items.)  The  document  file  is  used  to  supply  the 
various  bibliographic  and  subject  matter  information  pertaining 
to  the  set  of  documents  supplied  by  or  to  the  user^°^.  The 
commands  of  the  search  mode  follow. 

NUMBER 

The  NUMBER  command  will  produce  an  internal  list  of  accession 
numbers  satisfying  the  command's  criterion  of  a  logical  construc¬ 
tion  of  informational  clauses.  Each  informational  clause  consists 
of  one  index  term  or  a  logical  combination 'of  index  terms  preceded 
by  a  sector  designator.  The  sector  designator  indicates  the 
particular  section  of  the  document  being  referenced.  The  user 
receives  the  number  of  references  in  this  internal  list,  called 
the  Document  List.  The  sector  designators  are  listed  belcw. 

AITTH  -  indicates  author  section 
DATE  -  indicates  date  section 
TITL  -  indicates  title  section 
EDIT  -  indicates  editor  section 
ISSR  -  indicates  issuer  section 
DESC  -  indicates  subject  content  section 
JOUR  -  indicates  journal  section 
The  informational  clauses  may  be  combined  by  the  logical 
symbols  +  (inclusive  or),  &  (and),  and  f  (and  not).  This  is  also 
true  of  index  terras  within  any  one  informational  clause.  Paren¬ 
theses  serve  the  usual  purpose  of  resolving  ambiguities  by  estab¬ 
lishing  the  desired  logical  construction. 
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"Example : 

1)  NIMBER  (A'JTH  SMITH  +  DESC  RADAR  &  SONAR)  A  DATE  1967 
fny  document  that  was  both 

1)  written  in  19^7 »  and 

2)  either 

i)  written  by  Smith,  or 
ii)  pertaining  to  radar  and  sonar 
will  be  selected  by  this  request.  The  user  receives  the  number 
of  such  documents  selected. 

COMBINE 

The  COMBINE  command  performs  a  combinatorial  search  of  the 
inverted  file  based  on  the  informational  clauses  appearing  in 
the  command's  specification  part.  Each  information  clause  may 
contain  only  one  index  term  as  no  logical  symbols  are  used  in 
this  command.  The  informational  clauses  are  separated  by  slashes 
(/).  For  each  i,  i  =  1  to  n  where  n  is  the  number  of  index  terms, 
the  user  receives  the  number  of  documents  indexed  by  exactly  i  of 
the  n  index  terms. 

As  an  example  consider: 

COMBINE  AUTH  SMITH/  DESC  RADAR/  SONAR/  DATE  1967 
[Note:  SONAR  will  be  regarded  as  having  the  sector  designator 
DESC.] 

The  user  will  receive  information  as  listed  below. 

x  documents  retrieved  indexed  by  exactly  k  of  the  terms, 

y  documents  retrieved  indexed  by  exaHly  3  of  the  terms, 

z  documents  retrieved  indexed  by  exactly  2  of  the  terms. 
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w  documents  retrieved  indexed  by  exactly  1  of  the  terms. 

It  is  possible  to  limit  the  output  to  those  lines  desired. 

This  is  done  by  a  range , specifier  placed  immediately  after  the 
command  name.  The  range  specifier  is  part  of  the  e~  .fication 
part . 

The  range  specifier  consists  of  line  specifier  expressions 
joined  by  an  A  (representing  the  logical  'and')  or  0  (representing 
the  logical  inclusive  'or').  A  line  specifier  expression  consists 
of  quantifiers  followed  by  number  m,  where  m  £  n.  The  quantifiers 
are: 

G  -  greater  than 
L  -  less  than 
E  -  equal  to 

LE  -  less  than  or  equal  to 
GE  -  greater  than  or  equal  to 
The  entire  range  specifier  is  enclosed  in  parentheses. 

As  e.amples  consider: 

1)  (GS2AL4) 

The  user  has  limited  the  output  to  those  documents  indexed 
by  exactly  2.  or  3  of  the  command's  index  terras. 

2)  (203) 

The  user  gets  the  same  results  as  in  1. 

3)  (L5) 

The  user  receives  the  information  about  the  documents  indexed 
by  exactly  1,2,3  ard  U  of  the  given  index  terms. 
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As  an  example  of  a  complete  COMBINE  command,  consider 

COMBINE  (GE  3)  AUTH  SMITH/  DESC  RADAR/  SONAR/  DATE  196? 

The  user  receives: 

x  documents  retrieved  indexed  by  exactly  4  of  the  terms. 

y  documents  retrieved  indexed  by  exactly  3  of  the  terms. 

FORM 

The  NUMBER  and  COMBINE  commands  discussed  above  form  lists 
of  document  numbers.  Further  dialogue  determines  the  particular 
sections  of  each  document  desired  by  the  user.  The  lists  so 
formed  were  based  on  the  index  terms  appearing  in  the  specifica¬ 
tion  part  of  the  command. 

The  FORM  conmand  forms  a  list  of  accession  numbers  from 
those  that  appear  in  the  specification  part.  Any  number  (a  l) 
of  accession  numbers,  separated  from  each  other  by  a  comma,  can 
comprise  the  specification  part. 

Example 

FORM  2,1763,576 
Bibliographic  Commando 

Each  section  of  a  document  has  its  own  corresponding  command 
so  as  to  enable  that  piece  of  information  to  be  extracted  from 
each  document  listed  in  the  Document  List.  Each  of  the  following 
commands  will  return  to  the  user  the  indicated  information  for 
each  document  in  the  Document  List. 

AUfH 


TITL 
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DAT:. 

EDIT 

ISSR 

JOUR 

DESC 

In  addition,  DE3C/BIBLIO  and  D ESC/ ALL  will  return,  respectively 
all  the  bibliographic  information  and  all  the  information  of  each 
document  in  the  Document  List. 

2.3.2  Thesaurus  Mode 

The  Thesaurus  Mode  commands  make  reference  to  the  lexicon 
file  which  is  an  alphabetical  listing  of  all  words  relevan+  to 
the  field  of  discourse  of  the  information  retrieval  system.  In 
all  the  following  commands,  or  and  /3  are  strings  of  letters. 

THES/X  -  Returns  to  the  user  a  given  number  of  lexicon  words, 
each  beginning  with  the  letter  string  or,  as  in* 

THES/X  a 

THES/BF  -  Returns  to  the  user  a  given  number  of  lexicon  words 
alphabetically  before  the  letter  string  a,  as  in 
THES/BF  a 

THES/AF  -  Returns  to  the  user  a  given  number  of  lexicon  words 
alphabetically  after  the  letter  string  or,  as  in 
THES/AF  or 

THES/AR  -  Returns  to  the  user  a  given  number  of  lexicon  words 

alphabetically  surrounding  the  letter  string  or,  as  in 
THES/AR  a 

*  The  actual  number  of  words  retrieved  is  established  by  the 
system  administrator. 
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THES/BT  -  Returns  to  the  user  a  given  number  of  lexicon  words 
alphabetically  between  a  and  /3,  as  in 
THES/BT  a,  0 

These  commands  will  accept  any  number  of  or's  separated  by 
convnas.  (The  THES/BT  command  will  accept  any  number  of  pairs  of 
letter  strings.)  Examples  are: 

THES/X  o2» 

THES/BT  &2>  P2 

2.3*3  Define  Mode 

Execution  of  the  commands  in  this  mode  makes  use  of  the 
DEFINE  file  whose  records  consist  of  a  definition  and  several 
levels  of  semantic  expansion  for  each  word  of  the  file.  Semantic 
expansion  is  a  tool  to  provide  explanations  and  illustrations  at 
several  levels  of  detail  which  serve  to  instruct  the  searcher  on 
the  meanings  of  words  and  their  use  in  the  system.  The  definition 
of  the  unknown  term  may  be  considered  first-level  expansion. 

Further  levels  of  expansion  would  be  successively  a  one  paragraph 
description,  an  example,  and  finally  a  fully  detailed  illustration. 
See  Appendix  B  for  an  example  of  semantic  expansion. 

The  specification  part  includes  a  level  specifier  and  a  series 
of  words  separated  by  commas.  The  level  specifier  is  a  series  of 
level  numbers  sepaxated  by  commas  and  enclosed  in  parentheses. 

The  level  numbers  refer  to  the  level  of  expansion  desired  for  each 
word  listed.  If  the  level  specifier  is  omitted,  the  desired  level 


is  assessed  to  be  one. 
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DEFINE  (1,2)  RAJAH 

This  command  will  return  the  first  two  levels  of  semantic 
expansion  of  the  word  •RADAR'  to  the  user. 

2.3.4  Relation  Mode 

The  commands  of  the  Relation  Mode  make  use  of  binary 
relations  stored  in  the  Relation  Mode  File.  These  relations  may 
be  generic-specific,  noun-modifier,  whole-part,  word-words  of 
its  definition.  Only  the  generic-specific  is  used  in  this  system. 

RELATION  Command 

The  specification  part  contains  a  series  of  words  separated 
by  commas  preceded  by  a  relation  specifier  which  is  a  series  of 
numbers  each  separated  by  a  comma  and  enclosed  in  parentheses. 

The  numbers  of  the  relation  specifier  indicate  the  particular 
relations  defined  for  e£ch  word  listed. 

Example: 

RELATION  (8)  AUTOMOBILE 

Assuming  8  indicates  the  relation  'is  generic  to',  the  execution 
of  this  command  would  give  the  user  all  terms  generic  to  auto¬ 
mobile.  The  absence  of  the  relation  specifier  would  indicate 
that  all  relations  be  applied  to  the  list  of  words. 

2.4  System-Directed  Dialogue 

It  is  to  be  recalled  that  the  execution  of  the  NUMBER  and 
COMBINE  commands  of  the  search  mode  result  in  an  internal  list 
of  accession  numbers  being  formed  and  the  user  being  informed  of 
the  number  of  documents  so  listed.  If  these  conraands  are  not 
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followed  by  a  bibliographic  command,  the  .system  initiates  a 
computer-directed  search  to  ascertain  which  sections  of  these 
documents  is  of  concern  to  the  user.  Note  that  the  user,  with 
Reel  English,  has  the  choice  of  bypassing  this  dialogue  by 
specifying  the  desired  sections  in  his  request.  This  dialogue 
makes  the  user  aware  of  the  various  sections  of  the  documents. 

It  also  influences  the  grammar  of  the  system  since  it  also  has 
the  purpose  of  limiting  the  user's  messages  by  eliminating  a  call 
to  the  EXPLAIN  mode  in  t no  event  the  user  would  not  know  what  to 
do  after  he  got  the  number  of  references  satisfying  his  initial 
NUMBER  or  COMBINE  command. 


CHAPTER  3 

SYNTACTICAL  ANALYSIS 


The  syntactical  decomposition  of  the  user's  message  is  per* 

formed  by  means  of  string  analysis.  This  analysis  characterizes 

the  sentences  of  a  language  as  consisting  of  one  elementary 

sentence  (its  center),  plus  aero  or  more  elementary  adjuncts, 

i.e.  word- sequences  of  particular  structures  which  are  not  them* 

selves  sentences  and  which  are  adjoined  inmediately  to  the  right 

or  left  of  an  elementary  sentence  or  adjunct,  or  of  a  stated 

segment  of  an  elementary  sentence  or  adjunct,  or  of  any  one  of 

[6] 

these  with  adjuncts  adjoined  to  it  .An  elementary  sentence  or 

adjunct  is  a  string  of  words,  the  words  (or  particular  sequences 

of  them)  being  its  successive  segments. 

The  particular  syntactical  analysis  performed  in  the  Real 

English  system  is  baaed  upon  one  that  was  initially  developed  at 

the  Uhiverslty  of  Pennsylvania  under  the  direction  of  Professor 

Z.  Harris  and  continued  at  New  York  Uhiverslty  by  Dr.  N.  Sager. 

rigi 

This  grammatical  analysis  consists  of  three  separate  programs'-  J  s 
a  •  the  program  to  generate  the  word  dictionary  records 
b  -  the  program  to  generate  the  grsnma r  records 
c  -  the  program  which  analyzes  sentences  using  the  word 
dictionary,  grammar,  and  input  sentence  as  inputs 
The  grammar  is  composed  of  strings  and  restrictions.  A 
grammar  string  consists  of  definitions  of  the  grammatical  entity 
which  the  string  represents;  a  restriction  consists  of  tests  to 
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determine  whether  or  not  a  particular  definition  of  the  string  is 
valid  for  the  sentence  being  analyzed.  The  word  dictionary  supplie 
a  category  list  for  each  word  in  the  dictionary.  This  list  con¬ 
sists  of  all  the  categories  of  the  word:  e.g.  noun,  verb,  etc. 

The  sentence  analyzer  parses  a  sentence  by  constructing  a  tree 
(from  the  string  definitions)  whose  terminal  nodes  correspond  to 
the  categories  of  the  successive  wrds  of  the  sentence. 

In  the  context  used  here  the  syntactical  analysis  is  not  an 
end  in  itself  as  it  is  at  N.Y.U.  under  Dr.  Sager,  but  only  a  means 
to  «n  end;  i.e.  the  tree  produced  to  represent  the  parse  of  the 
sentence  is  used  by  the  system  to  determine  the  necessary  system 
commands  to  be  executed  to  fulfill  the  user's  request,  and  to 
form  the  proper  syntax  for  each  such  corrmand. 

Because  of  the  different  environment  in  which  the  parser  is 
to  be  used  in  the  Real  English  system,  several  changes  in  the 
N.Y.U.  granmar  werv*  carried  out.  The  N.Y.U.  grammar  is  highly 
sophisticated  as  to  the  type  of  sentences  it  can  properly  ana¬ 
lyse.  This  degree  of  sophistication  was  unwarranted  in  the 
present  application  because  the  linguistic  structures  likely  to 
be  encountered  were  restricted  by  the  environment  in  which  the 
granmar  is  used.  In  addition,  certain  sentence,  structures  that 
are  likely  to  occur  in  the  Real  English  system  (e.g.  the  conver- 
sationslly-dependent  sentences  di&oussed  in  Section  4.2.U)  are 
excluded  in  the  original  granmar.  Because  the  Real  English 
granmar  is  written  so  as  to  produce  a  parse  that  would  lend 
itself  to  relatively  easy  interpretation,  the  restrictions  used 
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in  the  treatment  of  conjunctions,  adjuncts,  and  objects  have  been 
changed.  In  addition,  restrictions  were  added  to  aid  in  the  proper 
alignment  of  index  terms  with  their  associated  grammar  string. 

The  pragmatic  interpreter  which  testa'  that  certain  conditions 
prevail  with  respect  to  the  tree,  has  been  incorporated  into  the 
system  as  grammar  restrictions.  The  logic  involved  in  the  prag¬ 
matic  interpreter  involves  procedures  to  scan  the  tree,  grammar, 
or  sentence  list.  These  procedures  have  already  been,  for  the 
most  part,  established  for  use  by  the  syntactical  analyzer,  so 
that  coding  the  pragmatic  interpreter  in  the  form  of  a  restriction 
list  which  will  call  upon  these  established  routines  will  result 
in  magnetic  core  memory  efficiency.  In  addition,  the  grammarian 
may  easily  change  this  logic  without  altering  any  of  the  estab¬ 
lished  programs. 

3.1  Word  Dictionary 

The  words  are  arranged  in  the  dictionary  as  follows: 
l)  a  word  is  placed  into  one  of  fourteen  groups,  i.e.  Group  1 
contains  words  of  one  or  two  characters,  Group  2  contains  words 
of  three  or  four  characters,  etc.,  2)  within  each  group  the  words 
are  put  into  lexigraphical  order  according  to  their  numerical 
representation  as  individual  characters.  A  word  W  has  a  set  of 
category  assignments  or  Cg  or  ...  or  Cn,  where  each  corres¬ 
ponds  to  a  part  of  speech,  a  particular  word,  or  to  a  special 
condition.  is  the  same  as  the  name  of  an  atomic  string  in  the 
grammar .  The  C^’s  form  a  category  list  C.  Each  category  may  have 
a  set  of  subcategories.  Therefore  C  consists  of  or 
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LgC2  or  ...  or  L,^  where  is  Cj^  and  Ci2  and  . . .  and  Cim. 
la  a  sat  of  properties  of  W  for  the  category. 

3.2  Oramnar  Strings 

A  granraar  string  S  is  defined  to  be  or  or  ...  or  Sn. 

The  S^'s  are  called  the  options  of  S.  The  S^'s  ore  defined  to  be 
and  and  ...  and  Sjm.  The  j  's  are  the  elements  of  the 
1th  option  and  are  themselves  granmar  strings  like  S.  Therefore, 
we  have  a  system  of  strings  composed  of  other  strings.  However, 
the  system  is  not  infinite.  The  process  ends  with  atomic  strings. 
An  atomic  string  is  a  symbol  which  is  not  composed  of  any  other 
string;  it  corresponds  to  a  word  category  whereas  the  non- atomic 
string  represents  a  broader  granmatical  entity. 

To  allow  for  a  more  refined  and  compact  gratmna r,  restrictions 
were  added  to  the  optiona  of  a  string.  The  complete  definition 
of  S  is  actually  RjS-^  or  RgSg  or  or  RnCn>  where  R^  is  a  series 
of  tests  to  be  performed  upon  the  sentence  or  tree.  If  fails, 
then  is  not  a  proper  choice  for  S. 

The  strings  are  represented  in  the  machine  as  lists.  The 
first  word  in  the  list  of  a  string  S  is  the  bead  of  S  which  con¬ 
tains  its  code  name  S  and  certain  properties  of  S.  The  second 
word  points  to  RjS^,  the  first  option  of  S  and  its  restriction. 

In  general  the  (n+l)8'  word  of  S  points  to  RqSq,  the  nttl  option 
and  its  restriction.  The  options  are  also  lists.  The  list  of 
each  option  la  similar  to  the  list  of  a  string  except  that  it 
has  no  head.  The  first  word  of  points  'to  the  head  of  S^,  the 
first  element  of  the  1th  option  of  string  S.  In  general  the  nth 


word  of  points  to  the  he&u>  of  S^n,  the  element  of  5^. 

There  is  a  elight  difference  in  atomic  strings.  The  head  of  the 
atomic  string  has  an  atomic  signal  and  the  string  is  composed  of 
only  a  head. 

3.3  The  Tree 

The  tree  is  constructed  by  the  analyser  program  from  the 
grammar  strings.  It  is  actually  a  record  of  the  options  and  their 
elements  which  the  program  has  chosen  from  the  definitions  of  the 
strings. 

Bach  unit  of  the  tree  is  called  a  node  and  corresponds  to  an 
element  of  an  option.  The  node  N  takes  up  eight  half-words  in 
memory.  N  consists  mainly  of  pointers:  an  "UP"  or  "LEFT"  pointer, 
a  "DOWN"  pointer  and  a  "RIGHT"  pointer  which  point  respectively  to 
the  node  above  or  to  the  left  of  N,  to  the  node  below  N  and  to  the 
node  to  the  right  of  N.  N  also  has  a  "GRAMMAR"  pointer  whose 
function  will  be  explained  later. 

Suppose  the  program  has  just  attached  a  node  NS  corresponding 
to  a  string  S  and  must  now  look  at  the  definition  of  S  and  choose 
an  option  from  it.  Suppose  the  option  S.^  (which  consists  of 
elements  ,  S^m)  is  the  correct  choice  for  S.  Nodes 

NSi1>NSi2**'*»  ^im  be  attached  to  the  tree  in  the  following 

manner: 

1)  NS  will  have  a  "DOWN"  pointer  to  NSir 

2)  NSjj_  will  have  an  "UP"  pointer  to  NS  and  a  "RIGHT" 
pointer  to  NSi2- 

3)  NSi2  will  have  a  "LEFT"  pointer  to  NSix  and  a  "RIGHT" 
pointer  to  NS^. 
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4)  NSim  will  have  a  "LEFT"  pointer  to  tut  no 

"RIGHT"  pointer. 

5)  Each  node  will  have  a  "GRAMMAR"  pointer  to  set  up  the 

correspondence  between  a.id: 

a)  S^j,  i.e.  the  name  of  the  corresponding  string 

b)  its  place  in  the  definition  of  S,  i.e.  it  occupies 

tta 

the  ij  position  in  the  definition  of  the  parent 
node  of  S. 

If  we  draw  the  tree  starting  from  S,  it  will  look  like: 

ns 

*U  *i2  ^13  ^14  »i(^l)  ^im 

The  part  -f  th  tree  under  NS  is  called  the  substructure  of  .No. 
MSj^  through  NSitn  are  all  one  level  below  NS;  hence,  US  is  their 
parent(node) .  NS^p  however,  is  the  only  node  directly  below  US. 
The  node  structure  via  the  position  and  grammar  pointers  shows 
the  particular  option  of  S  chosen  during  the  analysis.  The  con¬ 
text  in  which  S  was  chosen  can  be  ascertained  by  looking  at  the 
parent  node  of  3.  If  in  the  course  of  building  the  tree  it  wore 
found  that  the  particular  option  used  fc  •  S  ie  incorrect,  the 
tree  would  point  (via  the  grammar  pointer)  to  the  place  in  the 
grammar  wi.ire  the  choice  was  made.  This  would  enable  another 
option  for  S  to  be  chosen  and  a  different  substructure  for  US 


to  be  constructed. 
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Since  each  No^j  corresponds  to  a  string,  it  will  in  general 
have  a  substructure  similar  to  that  of  NS  shown  above.  However, 
if  Sjj  is  atomic  't  cannot  have  such  a  substructure  since  it  is  a 
symbol  and  does  not  consist  of  strings.  It  corresponds  to  a  part 
of  speech,  and  the  current  word  W  must  have  &  category  matching 
S^j.  If  W  does,  then  NSj_j  is  complete,  aid  it  corresponds  to  r.he 
analysis  of  W.  If  theie  is  no  match,  the  choice  was  incorrect 
and  a  different  substructure  would  have  to  be  built  for  S,  the 
parent  node.  When  the  tree  is  complete  for  a  sentence,  it  meana 
■'ll  the  branches  are  complete  and  all  the  words  of  e  sentence 

correspond  to  pome  atomic  node  of  the  tree  via  the  matching  pro-  j 

i 

cess.  Thus,  the  tree  represents  a  parse  of  the  sentence.  i 

> 

3.1*  The  Restrictions  l 

1  f 

A  restriction  .  s  a  series  of  routines  with  their  arguments 

ihlch  operate  in  mo  tree  or  any  of  the  lists  (grammar,  word  l 

■i 

dictionary,  or  oentence  lists).  The  restrictions  are  part  of  the 

i 

grammar  and  therefore  determined  by  the  graranarians ,  By  means  of 

these  routines  the  tree  or  list  structure  may  be  examined  for 

different  properties,  e.g.  .reliformedness  of  substructures.  A  I 

restriction  ic  itself  represented  in  the  machine  as  a  list.  Each 

.outine  in  the  restriction  is  executed  in  order;  if  any  routine 

in  the  list  fails  tne  restriction  fails. 

To  give  some  idea  of  the  versatility  of  the  routines  which 
make  up  the  restrictions,  they  axe  listed  below  according  to 
their  group  ldcntif’ cation.  Yov  more  details  about  any  particular 
routine,  refer  to  Appendix  C. 


Major  Routines  -  A  restriction  list,  composed  of  these 
routines  is  the  restriction  R^  fcund  on  the  option 
of  string  S. 

WELLF,  SUBJR,  VERBR,  MARK,  DSqiF, 

SPECF,  CHECF 

Logical  Routines  -  These  routines  perform  logical  tests 
or  operations  upon  other  restriction  lists. 

TRUE,  AND,  OR.  IMPLY,  CANDO,  WOT,  COMMf, 

ORFTH,  ITER,  EXPNT 

Climbing  Routines  -  These  routines  more  around  the  tree 
or  a  list. 

UPONE,  UPTRN,  UPTO,  LEFT,  RIGHT,  DOWN1, 

DOWN,  DWNTO,  DNTRN,  DNRIT,  RE5CTL,  PREVL, 

ATTRD,  IRTOL 

Property  Routines  -  These  routines  teat  certain 
properties  of  the  tree,  list  or  sentence. 

WDATM,  EMPTY,  ISIT,  ATTRB,  FIND,  PARSE, 

WORDL,  SENTL,  RARE,  REP 

Tree  to  List  Routi  ies  -  These  routines  use  the  tree  to 
get  to  a  list. 

fRSTL,  LASTL,  INTOL,  EXEC,  ATTRB,  KEXTL 

List  Producing  Routines  -  These  routines  work  on  the 
tree  or  a  list  to  produce  another  list. 

GENER,  EDIT,  SPCFY,  SCOPE 

Register  Routines  -  These  routines  use  certain  ’registers 
to  save  and  restore  nodes  or  list  vrords. 


STORE,  LOOKT 
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fl)  Pragmatic  Routines  -  These  routines  have  been  added  to 
the  original  analyzer  system  to  add  in  the  pragmatic 
interpretation  of  the  sentence. 

BIT IN,  BITT,  FETCH,  FILLIN,  NEXT AT,  PLACE 
SETT,  TSET 

3.5  Cross-Reference  Table 

Since  each  grammar  record  is  independent  of  the  others  the 
strings  must  be  linked  by  &  common  linkage  table,  called  the 
cross-reference  table.  Each  string  oust  have  its  name  in  the 
comn-.i.  cross-reference  table  so  that  it  can  be  referenced  by 
other  strings.  It  is  also  advantageous  to  have  independent 
records  for  restrictions  or  lists  that  are  used  frequently.  If 
the  name  of  a  restriction  or  list  is  placed  in  the  cross-reference 
table,  a  separate  grammar  record  may  be  set  up  for  it  and  may¬ 
be  referenced  by  any  other  record. 

3.6  Examples  of  G  ammar  Record,  Word  Record  and  Parse 

As  an  illustration  of  a  grammar  record  consider  that  of  COB 
shown  in  Figure  2.  Each  entry  in  the  record  is  made  up  of  three 
distinct  fields:  location,  operation,  and  variable  field, 
respectively.  The  DEFIN  in  the  operation  field  indicates  that 
the  options  for  the  grammatical  entity  named  in  the  location  field 
are  expressed  in  the  variable  field.  The  PATH  entries  indicate 
that  the  location  field  names  a  restriction  list  made  up  of  the 
routines  with  their  arguments  that  appear  in  the  variable  fi->ld. 
The  various  options  are  enclosed  in  parenthesis.  Within  each 
option,  the  elements  are  separated  by  commas.  Therefore  there  are 
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four  options  for  the  center  string,  COB,  These  are,  respectively, 
Cl  -  the  declarative  string,  COD  -  the  question  strings,  C3  -  the 
imperative  string,  and  C4  -  the  conversationally  dependent  strings. 
Each  of  the  first  three  options  has  a  restriction.  If  the  function 
of  each  roul.ine  of  each  restriction  list  were  gotten  from  Appendix 
C,  it  will  be  seen  that  restriction  list  CO. 10  checks  that  in 
order  to  try  the  declarative  option  (Cl)  there  must  be  no  question 
mark  in  the  sentence;  restriction  list  CO. 3  checks  that  in  order 
to  try  the  question  strings  (COD)  there  must  be  a  question  mark  in 
the  sentence;  and  finally  restriction  list  CO, 20  checks  that  in 
order  to  try  the  imperative  string  (C3)  there  must  not  be  a  ques¬ 
tion  mark  in  the  sentence  and  furthermore  there  must  be  an 
untensed  verb  mark  (A30)  on  the  main  category  list  of  some  word 
of  the  sentence. 

For  a  correspondence  between  the  symbolism  expressed  in 
8ectlon  3.2  and  the  above  example,  note: 


si  = 

(Cl) 

«1  = 

CO.  10 

s2  = 

(COD) 

Rg  =* 

CO. 3 

s3  * 

(C3) 

R3  = 

CO.  20 

84  * 

(c4) 

8n  ■ 

=  Cl,  s21  - 

COD,  S31  =  C3. 

-  s4l  ; 

The  word  dictionary  rec.  rd  of  written  will  serve  to  illustrate 
the  discussion  of  Section  3-1*  Considering  Figure  3,  the  main 
category  list  of  written  contains  one  element,  A32  -  past  parti¬ 
ciple  (or  Ven  category'*.  The  sublist  of  A32  contains  sufccateeory 
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* 

COB 

CENTER  STRINGS 

COB 

DEFIN 

((C0.10,(Ci),C0.3,(COD),C0.20,(C3),(CU)),T 

CO.  10 

PATH 

( ( DSQLF { ( C0102 , $X1 ) ) ) ) 

C0101 

PATCH 

( ( WORDL) ( ITER ( C010U , RNEXTL ) ) ) 

C0102 

PATH 

((NOT(COIOI))) 

CO.  3 

PATH 

(  ( DSQIiF(  (C0101,$X2  )  )  )  ) 

CO.  20 

PATH 

(( DSQLF ( ( C0102 , $X1 ) ( C0201, $X3) ) ) ) 

C0201 

PATH 

( ( WORDL)  ( ITER  (  C0202 ,  RHE3CTL)  )  ) 

C0202 

PATH 

((INT0L(CLSSL(A30))) 

C0104 

PATH 

((INT0L(CLSSL(A1»9))) 

END 

COB 

Figure  ? 


WORD 


WRITTEN 


LIST IS 

(.1.A32) 

1 

LISTIS 

(’.2,BO,.3,PO,BVC) 

2 

DEFOBJ 

((B2)) 

3 

DEFOBJ 

((A60)) 

END 

WRITTEN 

Figure  3 
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information  about  the  word  as  a  post  participle,  e.g.  the  active 
objects  (sublist  of  BO)  are  expressed  through  the  (B2)  option, 
the  passive  objects  (sUbllst  of  PO)  are  expressed  through  the 
(A60)  option,  and  in  addition  the  BVC  symbol  indicates  that 
written  is  a  bibliographic  verb,  i.e.  the  verb  can  be  used  to 
indicate  an  index  term's  bibliographic  status.  Rote  that  since 
no  verb  can  take  all  the  possible  object  strings  in  the  English 
language,  those  object  strings  that  are  permissible  are  listed  in 
the  verb's  word  dictionary  record  so  as  to  lover  the  parse  time. 

Consider  the  parse  shown  in  figure  4.  The  nodes  represent 
grammatical  entitles  as  described  in  Figure  S.  It  can  be  seen 
that  the  particular  option  chosen  for  any  string  ms y  be  obtained 
by  looking  at  the  nodes  one  level  below  the  node  in  question. 

For  example,  the  option  chosen  for  Cl  is  (B21,B1,B21,C1A,B21,B0, 
B43,B21). 

From  the  parse  it  can  be  seen  that:  , 

1)  the  user's  message  was  a  declarative  statement  as  the 
COB  (center  string)  string  has  the  Cl  option.  The  Cl 
as  seen  from  Figure  5  is  the  declarative  string. 

2)  the  subject  of  the  sentence  (the  B1  node)  is  a  pronoun 
as  is  seen  by  the  A21. 

3)  the  verb  of  the  sentence  (the  CIA  node)  is  a  tensed 
verb. 

4)  the  object  of  the  verb  is  the  pronoun  string: 

'anything  written  about  radar'. 


SYMBOL 

A21 

A24 

A31 

A32 

A48 

A60 


AlOO 

BO 

B1 

B2 

BU 

B5 

BS1 

B31 

B32 

BUl 

B42 

b43 


STRING  NAME 
Pronoun 
Preposition 
Tensed  verb 

Past  participle  for*  of  vert 
Period 

Empty  string.  Rot  all  the  elements  of  an  option 
are  required  to  correspond  to  some  words  of  the 
sentence.  When  a  sentence  occurs  in  vhich  these 
elements  do  not  have  any  correspondence  with  words 
of  the  sentence,  they  are  satisfied  by  the  empty 
string. 

Index  item 
Object  string 
Subject  string 
Noun  strings  as  object 
Pronoun  string 

Index  term  sequence  with  right  adjuncts 
Sentence  adjuncts 
Tensed  verb  with  adjuncts 
Past  participle  with  adjuncts 
Right  adjunct  of  noun  phrase 

Right  adjunct  of  auxiliary  words  (e.g.  may,  can, 
would) 

Right  adjunct  of  verb 


Figure  S 


Figure  S  (Con*!.): 


SYMBOL 

STRING  NAME 

A63 

Left  adjunct  of  verb 

B66 

Left  adjunct  of  preposition 

B69 

Left  adjunct  of  pronoun 

B90 

Active  object  strings 

B99 

Passive  object  strings 

00 

Superstring  -  the  root  of  every  tree 

CO* 

Introducer 

COB 

Center  string 

coc 

End  mark 

Cl 

Declarative 

C2 

Yes-no  question  string 

C3 

Imperative 

cu 

Conversatlonally-dependent  sentences 

C20 

Prepositional  phrases 

CIA 

Verb 

C132 

Ven  plus  passive  object 

B5A 

Index  item 

B5B 

Index  term 
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Note  the  choice  that  was  made  as  to  the  placement  of  'about 
radar'.  It  appears  as  a  prepositional  phrase  (C20)  acting  as  a 
right  adjunct  (b43)  of  the  verb  'written'.  It  might  be  argued 
that  this  prepositional  phrase  should  be  the  passive  object  of 
'written*  or  placed  with  the  post-object  B43.  The  decision  is 
based  upon  interpretation  ease  and  will  be  further  explained  later. 

3.7  Grammatical  Considerations 

In  the  Real  English  system  the  parse  produced  by  che  syntax 
analyzer  is  not  the  end  result  but  only  the  starting  point  in  the 
pragmatic  interpretation  of  the  user's  message.  The  grammar  used 
by  the  parser  is  written  with  this  in  mind.  When  a  choice  can  be 
made  concerning,  for  example,  the  placement  of  adjuncts,  the 
decision  is  based  upon  the  resulting  ease  of  interpretation  and 
consistency  with  the  reminder  of  the  system.  As  will  be  seen  in 
Chapter  4,  vherev*-  practical  the  pragmatic  interpretor  treats 
the  user's  message  as  if  it  were  of  the  form  "I  want  ...  ",  The 
missing  part  is  filled  in  with  what  is  called  the  ultimate  object 
of  the  sentence.  This  ultimate  object  may  be  viewed  as  the 
starting  point  in  the  sentence  of  the  informative  material.  In 
making  this  transformation,  care  must  be  taken  so  that  the 
ultimate  object  appears  in  the  parse  as  an  object. 

The  discussion  to  follow,  which  deals  with  various  considera¬ 
tions  in  developing  the  grammar,  finds  most  of  its  applicability 
in  the  search  mode  of  operation  as  the  examples  will  illustrate. 

It  is  to  be  noted,  at  this  time,  that  search  mode  queries  may 
be  looked  upon  as  a  series  of  clauses  each  of  which  references  a 
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bibliographic  piece  of  data.  The  grammar  will  cause  the  parser 
to  produce  a  tree  that  clearly  shows  these  inherent  eferences. 
Each  such  clause  will  contain  index  terms  of  some  kind  together 
with  a  distinguishing  verb  to  determine  the  typo  of  index  term 
(i.e.  author,  title,  etc.). 

3.7*1  'Aspectual'  Verbs 

Aspectuals,  for  example,  lixe,  want,  wish,  desire,  have  the 
property  of  taking  another  verbal  phrase  as  object  and  then 
having  this  verb  contain  an  object  that  could  have  been  the 
object  of  the  aspectual  verb.  In  other  words,  the  aspectuals  act 
as  meta-verbs  of  the  lower  level  verb  that  contains  an  object 
common  to  both  verbs.  If  this  second  verb  is  itself  an  aspectual, 
the  process  could  repeat.  The  final  object  of  this  sequence  could 
be  substituted  into  the  message  "I  want  ...  ",  and  then  treated 
insofar  as  the  interpretation  is  concerned,  as  the  original  mess¬ 
age.  This  final  object  referred  to  above  is  the  ‘ultimate  object' 
of  the  sentence.  Note  that  the  ultimate  object  has  brought  into 
focus  that  part  of  the  sentence  containing  the  informative 
material  as  far  as  command  recognition  is  concerned,  and  that 
indeed  the  ultimate  object  is  an  object  string.  Each  aspectual 
has  a  list  (ultimate  object  list)  of  object  strings  that  could 
cause  it  to  behave  aspectually.  The  verb  comprising  this  object 
string  must  have  the  proper  subcategory  necessary  to  accept 
the  string  as  an  intervening  object  string. 


Figure  6 


As  an  examp3.e,  consider  "I  would  like  to  have  anything 
written  on  radar”.  The  parse  of  this  sentence  shown  in  Figure  6 

i' 

reveals  that  the  object  of  like  is  the  C130  string  'to  have  ...  '. 
It.  happens  that  indeed  Cl 20  is  on  the  ultimate  ooject  list  of 
like  and  also  that  have  is  an  acceptable  verb  for  C130  to  act  as 
an  intervening  object  string.  The  object  of  have,  B2,  is  riot  on 
the  ultimate  object  list  of  have  and  as  such  is  the  ultimate 
object  of  the  sentence. 

3.7*2  Omission 

Consider:  "What  has  Jones  written  on  radar?”  This  message 
is  syntactically  divided  into  the  word  what  and  the  C2  (yes-no 
question)  string  with  a  missing  (or  omitted)  noun  as  indicated  by 
the  A6l  node  as  the  object  of  written  (see  Figure  7).  This  inner 
C2  string  with  omission  may  be  derived  from  'Has  Jones  written  N 

_  -li¬ 
on  radar?',  where  what  replaces  N.  In  any  event,  a  transformation 

on  this  C 2  string  will  transform  the  given  sentence  into  the 

desired  form  of  ’I  want  (ultimate  object)'.  In  the  case  of  the 

original  sentence,  the  ultimate  object  is  '(omission)  on  radar’. 

Therefore  the  pragmatic  imorpretor  would  consider  the  given 

sentence  as  'I  want  N  on  radar'.  The  apparently  lost  information 

expressed  by  ’has  Jones  written'  is  acknowledged  by  recognizing 

Jones  as  an  author  (this  recognition  process  will  be  explained  in 

Chapter  4)  and  storing  this  information  prior  to  the  start  of  the 


The  N  represents  a  noun  or  pronoun  phrase. 


radar 


Parae  of:  'What  haa  Jonea  written  on  radar? 


Figure  7 


pragmatic  interpretation  based  upon  'I  want  N  on  radar'. 


Note  that  the  parse  did  separate  the  two  informative  clauses, 
namely,  'has  Jones  written’  and  'on  r_dar'  instead  of  treating 
the  prepositional  phrase  'on  radar’  as  a  right  adjunct  of  written 
os  was  done  in  Figure  4.  The  problem  of  establishing  the  correct 
division  among  informational  clauses  of  a  sentence  and  having  the 
proper  node  used  as  the  ultimate  object  is  handled  by  the  joint 
granmar  definitions  and  restrictions  of  the  right  adjuncts  of 
verbs,  (i.e.,  B43),  and  the  omission  option  of  the  object  noun 
strings  (B2).  A  few  examples  will  help  illustrate  the  principles. 
Refer  to  Figures  8-12  for  the  respective  parse  of  sentences  1-5 
below. 

1)  What  do  you  have  written  on  radar  or  sonar? 

2)  Give  me  something  that  has  been  written  by  Jones  on 
radar  or  sonar. 

3)  Give  me  something  that  Jones  has  written  on  radar 
or  sonar. 

<-■)  i  want  all  the  papers  you  have  on  radar  or  sonar, 

5)  What  have  you  on  radar  or  sonar? 

Sentence  1  is  derived  from  'Do  you  have  anything  written  on  radar 

or  sonar? '  and  as  such  have  Should  have  an  omitted  object  whose 

right  adjunct  is  'written  on  radar  or  sonar’.  Comparing 

sentences  2  and  3,  one  notes  that  both  have  two  informational 

clauses  the  second  of  which  is  identical.  The  first  informational 

*  The  pragmatic  interpreter  considers  the  noun  of  N  as  indicative 
of  no  particular  system  file.  Therefore  N  may  be  replaced,  py 
anything  or  something. 


\  u  next# 


page) 


Parse  of:  'Give  me  something  that 
has  ooen  written  by  Jones  on  radar  or  :ionar  ' 


coc 

AU8 

• 

*“  4 1 

BO 

B90 

—  no 

■  b4 

somethin* 


C69A.  - - 

A69  B1 


B31' 
B5B  A31, 

B5A  ,  has 
A100 ,, 


BO 

B90 

C131 


Jones 


A32 

written 
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Parse  of: 

Give  me  something  that  Jones  has  written  on  radar  or  sonar. 


Figure  10 
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Parse  of:  ’I  want  all  the  papers  you  have  on  rad&r  or  sonar. ' 


Figure  11 


clause  of  each  sentence  is  a  C69  (i.e.  THAT  +  Cl  with  an  omitted 
noun).  In  sentence  2,  the  verbal  construction  is  passive  and  as 
such  the  index  term  follows  the  verb.  Therefore  the  prepositional 
phrase  (C20),  'by  .Tones',  is  associated  with  'written*.  In  sen¬ 
tence  3,  the  verbal  construction  is  active  and  as  such  the  index 
terms,  if  any,  precede  the  verb.  Therefore  the  prepositional 
phrase  'on  radar'  is  not  associated  with  written  but  instead  forms 
its  own  in format ional  clause.  Sentence  4  has  a  C70  (Cl  with  an 
omitted  noun)  as  a  right  adjunct  of  papers .  This  C70  is  derived 
from  the  complete  Cl:  'you  have  something  on  radar  or  sonar'. 
Therefore  the  object  of  have  in  again  the  omitted  noun  with  'on 
radar  or  sonar’  as  it3  adjunct.  Sentence  5  is  similar  to  the 
first  sentence  in  that  the  object  of  have  is  the  omitted  noun. 

Listed  below  are  the  restrictions  on  B43  and  the  omitted 
ontion  of  R2  that  n-comoid  "h  tie  rdw  e  divisions. 

BU3  Restrictions 

1)  Only  bibliographic  verbs  in  the  past  participle  form 
(i.e.  BVC  on  the  sublist  of  the  A32  category)  may  have 
non-zero  right  adjuncts.  A  zero  adjunct  is  A60.  These 
verbs  include:  written,  authored,  edited,  published, 
dated,  etc. 

2)  If  a  Von  which  is  also  a  BVC  occurs  in  the  G131  (Ven  as 
an  active  object)  then  B43  is  zero. 

B2  Restrictions  for  the  (A6l,B4l)  Option 

l)  If  B2  occurs  as  an  object  (BO)  then  either 

a)  the  PC  1g  an  clement  of  Cl  which  is  an  element  of 
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b)  the  BO  1 b  an  element  of  C131  or  C136  (the  untensed 
verb,  V,  with  Its  adjuncts)  which  la  an  element  of 
Cl  or  C2't  or 

c)  the  BO  is  an  element  of  C 2  whose  verb  la  a  form  of 
have  and  whose  subject  Is  a  pronoun. 

3.7.3  Adjuncts 

The  B2  option  mentioned  above  is  (A6l,  BUl)  and  not  Just 
(A6l)  because  the  former  yields  the  advantage  of  early  attachment 
of  the  right  adjuncts  of  the  omitted  word  directly  to  the  omission 
mark.  Other  situations  exist  in  which  an  adjunct  string  (usually 
B4l  -  right  adjunct  of  nouns  and  pronouns)  is  placed  as  an  element 
of  a  parent  string  so  as  to  cause  a  more  rapid  parse.  One  common 
example  applies  to  the  index  term  sequence,  B5, 

Consider  the  elementary  index  term  sequence:  B5 
B5  DEFIN  ( (B5B,Bhl) ) ,T 

B5B  DEFIN  ( (AlOO) , (B5B.A100)) 

As  can  be  seen,  B5B  is  a  recursive  string  in  that  if  the  first 
option  is  tried  and  fails,  the  second  option  will  cause  another 
B5B  node  to  be  attached  below  the  original  B5B  node.  Again  the 
lust  option  will  be  tried  and  again  it  will  fail  and  the  process 
would  continue  indefinitely  except  for  the  fact  that  the  parser 
recognizes  recursive  strings  and  requires  that  a  recursive  string 
S  starting  with  word  count  W  be  successful  (n-l)  times  before  S11 
(the  string  S  for  the  n-th  recursion)  is  allowed  to  be  attached 


with  a  ward  count  of  W. 
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Now  assume  a  message  as  follows: 

'Give  me  anything,  on  radar  written  by  E.  J.  Smith. ' 

Strictly  speaking,  both  of  the  informational  clauses:  on  radar 

and  written  by  F,.  J.  Smith  are  right  adjuncts  of  the  pronoun 

anything,  i.e.  they  are  both  right  adjuncts  or  B4l  strings.  The 

B4l  string  is  repetitive  which  means  that  its  lost  option  is 

(b4.i,b4i),  i.e.  it  consists  of  two  elements  each  of  which  is  the 

string  itself.  The  purpose  of  such  an  option  is  to  allow  for 

two  or  more  successive  occurrences  of  the  string.  Again  an 

% 

infinite  loop  could  be  set  up  if  a  repetitive  string  S  is  attached 
to  the  tree  and  all  of  its  options  (except  the  last)  fail.  To 
prevent  such  a  situation  the  parser  requires  that  in  order  to  try 
the  last  option  of  a  repetitive  string  with  word  count  W,  one  of 
the  previous  (n-l)  option.3  must  have  been  successful  in  analyzing 
at  least  one  word  of  the  sentence  starting  at  the  W-th  word. 

Consider  the  events  of  parsing  if  B5  did  not  have  the  B4l  as 
an  element  of  its  option.  The  prepositional  phrase  would  be 
attached  to  the  tree  as  right  adjunct  of  the  pronoun  anytliing. 

Since  written  by  F..  J.  Smith  would  then  fail  to  fit  into  any  of  the 
remaining  strings  to  be  placed  on  the  tree  the  parser  would  back¬ 
track  to  the  B4l  string  attempting  to  use  its  last  option.  Because 
it  was  already  successful  starting  from  the  word  on,  the  parser 
would  allow  the  last  option  to  be  tried.  Thus,  'on  radar'  would 
again  be  attached  to  the  tree  under  the  first  B4l  of  the  option 
(B4l,B4l)  and  'written  by  E.  Smith’  would  be  attached  under  the 
second  B4l.  Therefore  the  parser  was  required  to  construct  the 
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substructure  for  'on  radar'  twice.  If  this  substructure  had.  been 
longer  and  more  complicated,  this  repetition  in  construction  would 
vdd  unnecessary  de^ay  to  the  parsing  mechanism.  Having  the  Bhl  in 
the  05  option  permits  the  parser  xo  immediately  attach  the  second 
informational  clause  (i.e.  written  by  E.  J.  Smith)  to  the  tree. 

In  the  interpretation  phase  of  the  system,  this  is  taken  into 
account  in  analysing  the  BUI  element  of  a  B5  string. 

3.7.4  Conjunctions 

The  grammar  does  not  contain  coordinate  conjunctional  strings 
in  the  main  body  of  strings.  If  conjunctions  (end  other  "special 
words  of  the  language)  were  accounted  for  explicitly  in  the 
strings,  the  grammar  would  attain  very  large  proportions.  To 
avoid  this,  a  "special  process"  mechanism  exists  in  the  parser 
which  allows  for  the  insertion  of  an  clement  in  an  already  defined 
string  of  the  grammar  upon  the  appearance  of  a  conjunction  (or 
other  "special"  word)  in  the  sentence. 

Suppose  word  W  has  just  been  analyzed  and  word  W+l  has 
become  the  current  word.  If  W+l  has  a  special  process  mark  M  on 
its  category  list,  the  sublist  of  M  is  obtained.  It  is  a  list  T, 
defined  to  be  RjT^  or  R2T2  or  *  *  *  °r  Rk  a  sei,-cS  of  tests 

and  Tjj  is  an  option  of  T.  If  NS^j  (node  representin/j  the  string 
S^j)  is  the  current  node,  a  node  NT^  is  attached  (if  R^  permits) 
to  the  right  of  it.  Thus  when  (and  if)  N5  is  complete,  the  nodes 
NSii, . . . jNS^j ,NTki, . . . ,NSlra  appe  Jlow  it,  as  if  were  defined 
to  be  and  . . .  and  and  Tj,^  and  . . .  and  S^m.  Each  element 
Tj^i  is  the  grammar  string  which  contains  the  definitic  i  of  the 
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special  process  word,  Thus  i*  W  is  ’and.'  its  T  list  has  one 
option  and  has  one  element  which  is  the  gramnar  string 
representing  conjunctions,  However,  words  marked  special  may 
appear  in  a  sentence  in  one  of  their  non-special  uses.  Therefore 
a  word  with  a  special  mark  M  is  first  treated  specially,  and  if 
no  analyses  can  be  produced  the  special  process  marker  M  must  be 
ignored  at  this  point  -  NS^j  -  in  the  analysis  and  the  regular 
procedure  followed  until  the  analysis  of  NS  is  complete. 

In  the  case  of  simple  coordinate  conjunctions  the  special 
process  node  is  Ml  for  ’and',  M2  for  'or',  M3  for  'but.',  for 
'nor',  and  ft5  for  'as  well  as'. 

The  last  element  of  the  one  option  for  each  of  the  above  MJ 
(J  *  1  to  5)  strings  is  the  conjunctional  string,  Ql,  which  pro¬ 
duces  its  own  set  of  options.  Given  that  MJ  has  been  inserted 
following  NS^  of  string  S,  the  tree  would  look  like: 

Sf 


'il 


12 


'il 


M. 


atomic 
for  conj. 


~*Q1 


Q1  produces  the  following  set  of  options: 


(Sil) 

(sil-l»sil) 

(sil-2>sil-l»sil) 


»Bil 


). 


<Sil»S12*Si3» 
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That  la,  it  carries  out  the  process  of  structural  repetition  on 
the  elements  that  were  current  when  the  conjunction  appeared, 
namely,  S^,  ...,  Sj^. 

According  to  the  procedure  outlined  above,  a  M-node  will  be 
attached  at  each  succeedingly  higher  level  of  the  tree  until  it 
la  accepted  as  such  01  until  the  top  of  the  tree  is  reached  in 
which  case  backtracking  will  take  place.  At  each  such  level  the 
Q1  string  will  generate  a  set  of  options  representing  structural 
repetition  at  that  level.  From  the  experiments  performed  con¬ 
cerning  the  written  syntactical  structures  most  likely  to  occur, 
it  has  been  found  that  it  is  necessary  to  permit  such  M-node 
attachment  to  the  right  of  only  a  limited  number  of  nodes.  These 
nodes  are  listed  below; 

1)  B5  (index  term  sequence)  to  permit  a  logical  combination 
of  index  terms. 

2)  B2U  (quantifier  strings)  to  permit  a  logical  combination 
of  quantifier  sequences  as  used  in  the  Combine  command. 
For  example: 

'Give  me  the  papers  indexed  by  at  least  two  but  not 
more  than  four  of  the  following  terms:  radar,  sonar, 
laser,  maser,  pacer. 

3)  BO  (object  strings)  to  permit  multiple  requests,  A 
multiple  request  is  a  message  indicating  that  the  system 
is  to  respond  to  more  than  one  request.  For  example: 

'I  want  the  author  of  document  110  and  the  title  of 
anything  on  radar.'  In  such  a  case,  the  parse  is 
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accomplished  and  the  system  proceeds  to  analyse  only 
the  first  request  contained  in  the  message. 

k)  B32  (Ven  with  adjuncts)  to  permit  a  sequence  of 
bibliographic  verbs.  For  example: 

'What  has  been  written,  edited,  or  published  by 
Smith?  ’ 

5)  C91  (noun  phrases)  to  penult  a  sequence  of  related  nouns 
to  be  joined.  For  example: 

'What  papers  or  books  have  been  written  on  game 
theory? ' 

6)  B4l  (right  adjunct  of  noun  or  pronoun)  to  permit  con¬ 
joining  information  clauses  of  various  kinds.  For 
example: 

'  How  about  anything  on  radar  and  written  by  Smith . ' 

A  comma  is  treated  aa  a  simple  coordinate  conjunction  unless 
it  is  followed  by  a  different  coordinate  conjunction.  That  is, 
in  the  sequence:  'Give  me  any  stuff  concerning  radar,  sona-,  and 
laser.',  the  first  comna  is  treated  as  a  conjunction  whereas  the 
second  comma  is  treated  as  pure  punctuation.  If  this  wore  not  the 
case,  the  following  might  occur.  Considering  Figure  13,  note  that 
the  second  Q1  (i.e.  Ql^)  is  about  to  generate  the  option  (A51) 
which  cannot  fit  the  sequence.  The  point  is  that  the  combination 
',  and  '  is  one  conjunction  and  is  treated  as  stated  above.  As  a 
result  the  tree  would  be  as  shown  in  Figure  14.  The  special 
process  string  for  the  conjunction  cowna,  K10,  is  similarly 
limited  as  to  its  left  neighbors  on  the  tree.  The  M10  to  the 
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right  of  A100  is  allowed  since,  in  this  usage,  the  coma  is  not  a 
conjunction  since  there  is  no  01  string  as  an  element  of  M10. 

In  the  case  of  correlative  (or  scope-marked)  conjunctions, 
the  special  process  node  ie  Mil  for  'both. . .and. ,  M12  for 
'either... or... M13  for  'neither. . .nor. ..' ,  and  MlU  for 
'not  only. ..but  also...'. 

In  terms  of  the  operation  of  structural  repetition  by  which 
conjunctional  strings  are  obtained,  the  scope-marker  (C')  words 
either,  neither,  both,  and  not  only  can  be  seen  to  mark  the  point 
in  the  host  string  beyond  which  elements  cannot  be  'repeated*  in 
the  conjunctional  string.  That  is,  a  p cope-marker- and- conjunction 
pair  C'...C  marks  off  a  structure  X  (string  or  string-segment) 
which  also  appears  following  C  in  the  sentence:  C'XCX.  Since  x 
is  the  expected  structure  when  C'  occurs,  it  is  possible  to  define 
a  special  process,  initiated  by  C',  which  inserts  the  string 
C'XC  at  the  interrupt  point  where  X  is  the  specified  string  or 
string-element;  when  this  string  is  satisfied  the  program  reverts 
to  its  normal  operation  and  finds  X  as  it  expected  before  the 
interruption. 

The  element  in  the  one  option  for  any  scope-marked  special 
process  string  is  the  scope-marked  conjunctional  string  Q2  which 
produces  all  of  the  possible  options  for  X  from  the  remaining 
string  elements  to  be  attached  to  the  tree  at  the  present  level. 

Consider  the  string  S  »  (sii»si2» • • * »sin^  the  following 

tree:  ® 

Sil  Si2  Si3  Si/-1  Si/ 


(/  <  n) 
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tfcon  occurrence  of  &  scope-marked  conjunction,  say  M12,  the  tree 
is  expanded  to: 


S 


il 


U 


M12 


A2 


0? 


(either) 

The  options  produced  by  Q2  are: 

(sil+l) 


*Sil+l»8iH«) 


(Sii+1,Sil+2, . . . ,Silul,8ln) . 

Assuming  that  the  second  option,  (Sil+1,S11+2) ,  were  actually 
correct,  the  tree  v*)uld  appear  sa: 


il 


'12 


T  fc"l 


13 


ir: 

..  j 

A2 


1  i*+l  ‘"1/+2  8ii+3  8in-l  “in 


02 


(either) 


A52 


I 

(or)  i 

^  Eif+1  Sil4fi 
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The  dotted  enclosure  Is  the  Inserted  structure  C'XC. 

Based  once  Again  on  the  results  of  the  language  study  experi¬ 
ments  discussed  before,  it  vas  discovered  that  the  scope-marked 
conjunctions  also  have  a  restricted  set  of  left  attaching  nodes. 

These  correlative  conjunctions  find  their  most  vide  applica¬ 
bility  in  structures  involving  the  logical  construction  of  index 
terms  a  ^  as  such  may  be  attached  to  and,  _> ,  or,  but  and  the 
prepositions  that  usually  signal  the  oncoming  index  tern  sequence. 


CHAPTER  4 


PRACMATIC  INTERPRETER 

4.1  Introduction 

After  the  construction  of  the  tree,  the  system  may  be  thought 
of  as  existing  in  one  of  four  states.  The  pi  sgmatic  interpreter 
is  concerned  directly  with  states  1  and  2  and  will  he  discussed  in 
the  present  chapter*.  The  third  and  fourth  states  which  are, 
respectively,  the  specification  filler  and  the  organiser  will  be 
discussed  xxi  Chapter  5. 

A  description  of  the  system  states  follows: 

State  1:  System  is  engaged  in  the  Ultimate  Object  Analysis 
which  determines  the  starting  node  to  be  used  in 
the  interpretation  of  the  message. 

8tate  2:  Starting  from  the  ultimate  object  or  other  node 
selected  by  the  Ultimate  Object  Analysis,  the 
conmand-aet  (not  necessarily  with  the  syntax 
required  for  each  command)  is  determined. 

State  3:  This  state  is  involved  if  the  command-set  includes 
either  the  NUMBER  or  COMBINE  conmand.  The  syntax 
for  these  coronanda  is  formed  during  this  state. 

The  process,  to  be  discussed  in  Chapter  S,  depends 
on  the  command. 

With  the  NUMBER  command,  the  system  will: 
a)  associate  the  proper  sector  (i.e.  author,  title, 
etc.)  with  each  index  term,  and 

*  The  term  "pragmatic  interpreter"  implies  that  the  system  is 
interested  in  the  'intended'  meaning  of  the  user's  message. 
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2)  C5  String  -  Question  vlth  wh-  with  noun-omission. 

The  question  (or  wh-)  word  may  be  what,  who. 

Examples)  Who  wrote  documents  1296,  301  and  2T 
What  Is  generic  to  automobile? 

What  do  you  hare  on  sonar,  and  either  laser 
or  maaer? 

3)  C6  string  -  Question  with  wh.  with  or  without  noun- 
omlsslon. 

The  vh-  word  may  be  when,  how,  where. 

Examples:  How  Is  radar  defined? 

Where  was  document  16  published? 

When  was  accession  nunber  412  written? 

4)  C9  String  -  Question  with  vh  +  noun  with  noun-omission. 

The  vh-  word  may  be  how  many,  what,  which. 

Examples:  What  words  do  you  have  starting  with  st? 

How  many  papers  on  radar  are  there  In  the  file? 
Which  words  are  synonymic  to  procedure? 

The  philosophy  behind  the  analysis  of  question  strings  is 
to  1)  transform  the  given  message  Into  an  equivalent  declarative, 
2)  compare  the  two  parses  to  determine  the  first  node  of  a 
coosnon  information  clause,  and  3)  if  other  Informational  clauses 
seem  to  be  ignored  by  an  analysis  of  the  equivalent  declarative 
starting  from  this  point,  analyse  these  clauses  "according  to  the 
dues  contained  therein". 

Such  clues  mentioned  above  are  stored  in  the  word  records 
and  are  brought  out  by  several  system  packages  each  of  which  is 
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called  upon  In  various  situations.  Some  system  packages  with 
examples  of  their  use  in  the  general  philosophy  follow. 

1.  The  Set  Conrnan^  Package 

A  Ven  (past  particle  form  of  verb)  or  A30  (untensed  verb)  in 
a  C131  (Ven  +  O  active)  or  C136  (V  +  0)  string  respectively  may¬ 
be  associated  with  a  group  of  index  terms.  The  word  records  of 
this  verb  contain  information  indicating  the  command  to  be  set. 

(It  is  to  be  noted  that  only  the  NUMBER  and  COMBINE  commands 
require  elaborate  syntax  formation.  The  other  commands  need 
at  most  a  listing  of  the  index  terms.)  In  the  event  that  the 
NUMBER  or  COMBINE  is  set,  the  word  record  also  supplies  the  data 
necessary  to  decide  the  index  term’s  sector  designation  (i.e. 
author,  title,  abstract,  etc.).  A  code  representing  this  sector 
together  with  the  index  terms  are  put  into  the  ’ARGUMENT’  buffer 
as  a  partial  syntax  formation  of  part  of  the  user’s  message. 

This  package  finds  its  application  in  the  ultimate  analysis 
of  various  question  strings.  As  an  example,  consider 

'What  has  Jones  written  on  radar? 1 

The  parse  of  this  C5  string  is  shown  in  Figure  IS.  Considering 
the  given  sentence  as  it  might  appear  in  an  equivalent  declarative 
sentence,  the  parse  of  'I  want  anything  Jones  has  written  on 
radar. '  shown  in  Figure  16  will  suffice. 

From  Figure  16  it  may  be  seen  that  the  two  informational 
clauses  'Jones  has  written',  and  ’on  radar'  are  right  adjuncts 
of  anything  (considered  to  be  analogous  to  the  omission  mark  of 
Figure  IS  since  the  word  anything  implies  no  Mm  to  any 


P«tm  of:  ’What  haa  Jcmea  written  on  radar?  • 


Figure  15 
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Parse  of:  'I  vonv  anything  Jones  has  written  on  radar.'- 


Figure  16 


particular  file  &a  would  papcra  or  documents) .  Comparing  Figures 
15  and  16,  It  is  seen  that  their  first  node  of  common  structure 
is  the  B4l  node  of  the  (A6l,B4l)  option  of  B2  which  becomes  the 
ultimate  object.  In  order  to  proceed  from  this  point  as  a 
declarative,  however,  the  other  Informations^,  clause  must  first 
be  analysed.  To  do  so,  the  8et  Command  Package  interrogates  the 
word  record  of  written  for  a  SC  code  on  the  sublist  of  the 
category  used  in  the  parse,  l.e.  the  sUblist  of  A32. 


WORD 

WRITTEN 

LIST IS 

(-1.A32) 

.1 

LISTIS 

( .2, BO, .3, BO, .4,8C,BVC) 

.s 

DEtOBJ 

((B2)) 

.3 

DEFOBJ 

((A60)) 

.4 

LISTIS 

(.41, Zl) 

.41 

LISTIS 

\&) 

BUD 

WRITTEN 

Figure  17 

The  subllst  of  SC  contain:i  the  bit  (21  is  the  first  bit,  Z2 
la  the  seccod,  etc.)  to  be  set  in  the  flag  word  representing 
the  eonmand  to  be  used  in  the  execution  of  the  request. 

According  to  Figure  17,  writteu  causes  the  NUMBER  command  to  be 
set  since,  as  shown  below,  the  first  bit  represents  the  NUMBER 
command.  The  analysis  of  the  Informational  clause  containing 
written  is  treated  as  occurring  in  the  declarative  of  Figure  16 
(i.e.  Jones  has  written).  The  SIM  Sublist  Value  Package  discussed 


1 


•  . . tea msurnk  ^lA^ihiyii.iils'iPiiiiii  I  JkJiM. 
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Bit  Position 

Command 

Bit  Position 

Command 

1 

NIMBER 

7 

thes/bf 

2 

COMBINE 

8 

thes/af 

3 

SYNONYM 

9 

THES/X 

4 

DEFINE 

10 

thes/bt 

5 

RELATION 

11 

thes/ar 

12 

FORM 

below  completes  the  analysis  of  this  clause.  Because  the 
command-set  has  been  determined,  the  ultimate  object  analyzer 
will  ready  the  system  for  State  3  execution  by  pointing  to  the 
B4l  node  mentioned  previously.  State  3  will  continue  the 
analysis  by  considering  the  analogous  declarative  sentence. 

Notice  that  for  this  sentence,  the  ultimate  object  analyzer 
performed  the  following: 

1)  The  comm&.,d-set  was  determined  by  the  SET  COMMAND 
PACKAGE. 

2)  One  informational  clause  was  partially  analyzed. 

3)  The  node  representing  the  remaining  informational 
clause  was  found  (viz:  the  B4l  of  the  (A6l,B4l) 
option  of  B2) , 

4)  Readied  the  system  for  State  3  execution. 
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Notice  that  sentences  like  those  below  *re  similarly  analysed. 

Has  Carter  or  Wilson  written  on  game  theory? 

bid  Greene  edit  a  book  on  network  analysis? 

2.  The  Index  Term  Lister  Package 

The  syntax  of  many  of  the  system  commands  requires  a  list  of 
index  terms  (an  index  term  is  composed  of  one  or  more  index  items) 
separated  by  comas.  When  the  commands  set  is  one  requiring  such 
a  format,  the  system  will  execute  the  Index  Term  Lister  Package. 
The  substructure  of  the  B5  node  contains  an  atomic  A100  for  each 
index  item  of  the  index  term  represented  by  the  parent  B5.  This 
package  gathers  all  the  index  Items  from  the  appropriate  sub¬ 
structure  of  the  tree  and  places  their  EBCDIC  representation 
into  the  'ARGUMENT'  buffer  separating  the  index  terms  by  comas. 

Consider: 

'What  is  reentrant  code  end  time  sharing? 

Recognizing  that  the  message  is  a  C5  whose  main  verb  is  &  form  of 
BE,  subject  is  an  index  term  sequence  and  object  is  empty,  causes 
the  DEFINE  command  to  be  set.  The  Index  Term  Lister  Package  is 
executed  starting  at  the  subject  node  Bl.  This  will  cause  the 
index  items  reentrant,  cofe,  time  and  sharing  to  be  placed  in 
the  'ARtHKENT'  buffer  viu.  a  comma  separating  reentrant  code  from 
tliae  sharing. 

3.  The  8EM'  Sublist  Value  Package 

In  a  previous  example,  viz:  'What  has  Jones  written  on 
radar? ' ,  it  was  stated  that  together  with  the  sector  codes  gotten 
from  the  word  record  of  written  the  index  term  Jones  was  put  into 
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the  'ARGUMENT'  buffer.  This  Is  not  entirely  true.  Consider  the 
message:  'What  have  Jones  and  Wilson  written  on  radar?'.  This 
message  differs  from  the  previous  request  in  that  the  subject 
string  is  satisfied  by  a  logical  construction  of  index  terms.  A 
call  for  the  Index  Term  Lister  Package  would  cause  Jones  and 
Wilson,  separated  by  a  consna,  to  be  placed  in  the  'ARGUMENT' 
buffer.  Because  the  NUMBER  eonnand  requires  A,  +  or  t  between 
index  terms  of  the  same  sector  designator  and  not  conmas,  problems 
would  arise.  Therefore  to  keep  the  logical  structure  of  the 
message,  the  SEM  Sublist  Value  Package  is  executed. 

The  SIM  Sublist  Value  Package  will  be  briefly  introduced 
below  and  more  fully  developed  in  Chapter  5.  Every  significant 
word  occurring  in  an  informational  clause  of  a  NUMBER  command  has 
a  SEM  code  on  the  sublist  of  the  category  used  for  that  word 
occurrence.  Most  such  words  have  only  one  value  on  its  SIM  sub¬ 
list.  This  value  may  or  nay  not  have  its  own  sublist .  in  such 
cases  that  a  word  has  more  than  one  value,  the  context  of  its 

*  w 

usage  as  indicated  by  the  parse  dictates  which  value  is  used. 
Therefore  the  function  of  this  package  is  to  place  the  SIM  value 
of  every  word  in  the  sentence  in  the  indicated  substructure  under 
consideration  into  the1  'ARGUMENT'  buffer.  Index  terms  have  their 
EBCDIC  representation  together  with  their  SIM  value  placed  in 
the  buffer.  The  purpose  of  having  such  a  package  is  to  make  for 
uniform  analysis  of  the  various  syntactical  structures  that  could 
constitute  an  informational  clause.  This  will  be  brought  out  more 
fully  in  Chapter  5, 
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Returning  to  the  example :  'What  have  Jonea  and  Wilson 
written  on  radar?  ’  after  executing  thla  package,  the  'ARGUMENT' 
buffer  will  contain  the  following: 


200  Jones 

52  200 

Wilson  101 

Index  term 

SEM  Value 

SEM  Value  of  written 

Value  of  SEM 

for  and 

holding  key  to 

sector  designation 

4.2.4  Conversationally- Dependent  Sentences 

Convera at ion ally-dependant  sentences  are  of  two  types: 

1)  they  are  responses  to  a  system  reply  to  a  previous  request 
and  as  such  are  abbreviated,  or  2)  they  are  requests  in  which  the 
user  has  used  that  part  of  a  declarative  corresponding  to  the 
ultimate  object.  Examples  of  each  type  are: 

Type  1:  How  about  Jones. 

And  Smith. 

Allot. 

Type  2:  Documents  by  Greene. 

Synonymic  to  instruction. 

Words  generic  to  motor  vehicle. 

The  type  2  con versat ion ally-dependent  sentence  is  treated  as  if 
it  were  preceded  by  'I  want  . . . ’  which  is  equivalent  to  saying 
that  the  sentence  is  the  ultimate  object.  Type  1  requests  are 
entirely  different  as  they  are  based  upon  previous  dialogue. 


Consider  the  following  two  cues: 
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CASE  I 

CASE  II 

user: 

I  want  the  definition 

I  want  everything  about 

of  radar. 

radar. 

system: 

I  don't  know. 

I  don't  have  anything. 

user: 

How  about  sonar? 

How  about  sonar? 

The  second  user  response  in  both  cases  is  identical,  yet  their 
pragmatic  interpretation  must  be  different  since  in  Case  I  a 
definition  is  requested  whereas  in  Case  II  a  document  search  is 
requested  on  the  index  term  'sonar1.  In  Case  II,  the  system 
uses  the  first  sector  designator  of  the  original  request  as  the 
designator  of  'sonar'.  Therefore  after  each  request,  a  record  is 
kept  of: 

1)  the  conmand-set 

2)  the  sector  code  of  the  first  index  term  used  in 
a  NUMBER  or  COMBINE  if  such  a  command  were  the 
previous  command. 

To  summarize,  then,  in  the  interpretation  of  type  1  conversationally* 
dependent  sentences  the  system  will  use  the  previous  command 
together  with  the  newly  supplied  index  terms. 

4.2.5  Examples 

Belov  are  listed  user  messages  along  with  the  corresponding 
transformed  message  used  in  the  analysis.  Also  shown,  when 
applicable,  is  the  'ARGUMENT'  buffer,  STATE  of  system  to  be 
entered  and  ultimate  object. 


1)  I  went  som«  papers  written  by  Carter. 

a.  Ultimate  Object  is:  'some  papers  ..." 

b.  To  enter  State  2. 

c.  Transformed  message  is  same  as  original. 

2)  Giro  me  the  papers  written  by  Carter. 

a.  Ultimate  Object  is:  'aw  the  papers 

b.  Enter  State  2. 

c.  Transformed  message  is:  I  want  the  papers 
written  by  Carter. 

3)  What  has  Carter  written? 

a.  Enter  State  3. 

b.  'Argument':  200  Carter  101 

4)  Hare  you  anything  written  by  Carter? 

a.  Ultimate  Object  is:  'anything  ...' 

b.  Enter  State  2. 

c.  Transformed  message  is:  I  want  anything  written 
by  Carter. 

5)  What  boohs  do  you  have  which  were  written  by  Carter? 

a.  Ultimate  Object  is:  'books  which  were  written 
by  Carter. ' 

b.  Enter  State  2. 

c.  Transformed  message  is  (in  two  steps): 

1.  What  do  you  have  which  was  written  by  Carter?* 

2.  I  want  books  which  were  written  by  Carter. 


A  note  is  asde  of  books 


tfo. 


6)  Do  you  have  any  papers  that  Carter  wrote? 

a.  Ultimate  Object  is:  'any  papers  that  Carter  wrote'. 

b.  Enter  State  2. 

c.  Transformed  message  is:  'I  want  any  papers  that 
Carter  wrote' . 

7)  How  are  radar  and  sonar  defined? 

a.  Enter  State  4. 

b.  'Argument':  radar,  sonar. 

4.3  Command-Set  Generator 

The  Command-Set  Generator,  or  State  2,  Is  called  upon  If  the 
Ultimate  Object  Analysis  failed  to  determine  the  complete  conraand- 
aet  necessary  for  the  proper  execution  of  the  user's  request. 

State  2  starts  its  analysis  at  the  ultimate  object  node  of  the 
transformed  message.  In  all  but  exceptional  cases  the  ultimate 
object  is  a  noun  (or  pronoun)  phrase  (the  noun  of  which  is  called 
the  core)  whose  adjuncts  are  informational  clauses.  The  pragmatic 
content  of  these  nouns  is  coded  and  placed  with  its  word  dictionary 
record.  Words  like  information,  data,  material,  stuff  offer 
no  clue  as  to  the  desired  mode  of  operation,  whereas  words  like 
■papers,  words,  definition,  author  carry  definitive  pragmatic 
information.  In  fact  $he  nouns  appearing  as  the  core  of  an 
ultimate  object  may  be  classified  into  one  of  three  groups: 
l)  no  specific  mode:  anything,  something.  2)  non-search  mode: 
words,  definition,  phrases ,  and  3)  search  mode:  documents,  papers, 


books.  The  right  adjuncts  of  the  core  noun  are  then  analyzed,  one 


81. 


toy  on*  in  order  of  appearance  in  the  aentehce,  until  the  comaand- 
aet  ia  eatabliahed.  Each  individual  right  adjunct  carries  with 
it  its  own  decoding  scheme  toased  upon  the  atonies  in  ita  sub¬ 
structure  and  the  subject-verb-object  or  verb-object  relationship 
of  the  adjunct.  Examples  follow. 

h.3.1  The  Yen  Phrase  and  Pure  Prepositional  Phraae 

The  mechanism  involved  in  the  decoding  of  prepositional 
phrases  is  embedded  into  that  of  the  Ven  phrase  (i.e.  put 
participle  +  Cl  passive).  The  put  participles  (Yen)  encountered 
in  such  environments  are  classified  as  search  mode  oriented  (BVC 
in  word  record)  or  relation  mode  oriented  (RVC).  Within  the 
search  mode,  this  Ven  may  signal  the  execution  of  either  the 
NUMBER,  COMBINE,  or  FORM  command.  The  ambiguity  is  resolved  by 
the  prepositional  phrase  associated  with  the  Ven  u  either  a 
right  adjunct  of  this  verb,  Bb3,  or  as  the  passive  object  of  this 
verb,  B99.  Notice  that  at  this  point,  the  prepositional  phrase 
may  be  separately  analyzed  u  such  u  long  as  the  presence  of  the 
associated  Ven  is  taken  into  account.  The  processing  of  the 
prepositional  phrase  takes  into  account  1)  the  preposition  itself, 

2)  the  associated  verb,  if  any,  which  may  be  Ven,  Ving,  tv, 

3)  the  object  of  the  preposition  -  it  may  be  an  index  term  which 
may  or  may  not  indicate  a  date,  or  it  may  be  another  noun  phrase, 
k)  the  presence  anywhere  in  the  sentence  of  a  phrase  which  would, 
specifically  direct  the  system  to  a  particular  mode  of  operation, 
such  as  'in  the  thesaurus'  in  'Give  me  everything  in  the  thesaurus 
about  radar.',  and  5)  the  group  into  which  the  core  of  the  ultimate 
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object  is  classified. 

-i 

As  an 

example  consider 

the  word  record,  shown  In  Figure  lfl, 

of  about. 

■; 

WORD 

ABOUT 

4 

LISTIS 

(,15,A24) 

.15 

LISTIS 

( .1.TYP1, .2.TYP2, .3,TYP3) 

rS 

.1 

LISTIS 

( .11,V0ID, .11,BVC, .lU,TVC) 

7 

.11 

LISTIS 

( (AS ) ,TD1, (Al) ,TD3, (Al) ,TD4) 

i 

.14 

LISTIS 

( (All) ,TD1, (All) ,TD3, (All) ,TD4; 

.2 

LISTIS 

( .21, VOID, .21,BVC, .1U,TVC) 

- 

.21 

LISTIS 

( (A5),TD1,  (A5),TD3, (A5) ,TD4) 

- 

.3 

LISTIS 

(.31,VOID,.31,BVC) 

.31 

LISTIS 

((A1),TD3,(A1),TD4) 

- 

END 

ABOUT 

■- 

Figure  18 

The  group  into  which  the  core  of  the  ultimate  object  is 
classified  is  recorded  as  follows: 

The  OBTTP  (object  type)  variable  has  the  values: 

TYP1  -  no  specific  mode 
TYP2  -  non-search  mode 
TYP3  -  search  mode 

The  associated  verb  is  Indicated  by  the  value  of  the  PREVB 
(previous  verb)  variable  as  follows: 

VOID  -  no  associated  verb 


8J 


BVC  -  verb  associated  with  bibliographic  data 
(e.g.  written) 

RVC  «  verb  associated  with  relational  data 
(e.g.  related) 

TVC  -  verb  associated  with  thesaurus  or  lexicon  data 
(e.g.  beginning) 

The  presence  of  particular  phrases  directing  the  system  to  a 
particular  mode  of  operation  is  Indicated  by  the  variable  TD1CT 
(thesaurus-dictionary)  as  follows: 

TD1  -  Indicates  THESAURUS  node 

TD2  -  indicates  DEFINE  node 

TD3  -  indicates  no  specific  mode  and  the  index  term 
is  not  a  date 

TDh  -  indicates  no  specific  node  and  the  index  term 
is  a  date 

How,  consider  the  following  sentences. 

1)  I  want  anything  about  radar. 

2)  1  want  any  papers  about  radar. 

3)  I  want  anything  about  radar  in  the  thesaurus. 

In  the  analysis  of  these  three  sentences  the  ultimate  object 
would  be  respectively:  'anything  about  radar',  'any  papers  about 
radar',  and  'anything  about  radar  in  the  thesaurus'.  The 
respective  values  of  l)  OBTYP  are  TYP1,  TYP3*  and  TIPI,  2)  PREVB 
are  VOID,  VOID  and  VOID,  and  3)  TDICT  are  TD3,  TD3,  TD1.  In  each 
case,  the  Cona&and-Set  Generator  will  analyze  the  prepositional 
phrase  'about  radar' .  All  the  necessary  information  for  this 


analysis  is  stored  in  the  word  dictionary  record  of  the  preposi¬ 
tion  in  tree- like  fashion. 

The  process  is  as  follows: 

1)  Start  at  the  sublist  of  the  preposition  (A24). 

In  this  case  .15  (refer  to  Figure  18). 

2)  Go  to  the  sublist  of  the  symbol  that  is  the  value  of 
the  variable,  OBTYP, 

3)  Go  to  the  sublist  of  the  symbol  that  is  the  value  of  the 
variable,  PHEVB. 

4)  Go  to  the  sublist  of  the  symbol  that  is  the  value  of  the 
variable,  TDICT. 

5)  This  sublist  contains  a  symbol  Ax,  where  x  is  a  number 
from  1  to  11.  The  value  of  x  corresponds  to  the  xth 
bit  position  of  the  CMAND1  variable,  which  is  one  of  two 
variables  (the  other  is  CMAND2)  used  to  specify  the 
oonmond-set.  Each  bit  corresponds  to  a  different 
cosnand,  as  shown  below: 


CMAND1 

Bit  Number 

Conraand 

1 

NIMBER 

2 

COMBINE 

3 

SYNONYM 

4 

DEFINE 

5 

RELATION 

6 

NOT  USED 

7 

THE3/BF 

8s 


8 

thes/af 

9 

THES/x 

10 

thes/bt 

n 

THES/AR 

12 

FORM 

The  bit  positions  of  CMAND2  correspond  to  the  commands: 
AUTHOR,  DATE,  TITLE,  EDITOR,  PUBLISHER,  JOURNAL,  SPECIFIC, 

GENERIC,  AUTHORITY  LIST  ENTRY,  ABSTRACT,  DESCRIPTORS,  DE3C/ALL, 

desc/bihlio. 

Using  this  scheme,  sentences  1  and  2  vill  set  the  NUMBER 
commend,  sentence  3  will  set  the  RELATION  command.  Although 
sentences  2  and  3  are  interpreted  correctly,  it  may  be  argued 
that  sentence  1  could  be  requesting  information  concerning  ’radar’ 
from  any  of  the  mode  files.  In  this  sense,  sentence  1  is 
ambiguous.  Experience  will  help  decide  the  eventual  course  to 
take  in  such  cases.  The  choice  selected  here  is  based  on  the 
experiences  of  the  author.  It  may  be  that  1)  a  dialogue  between 
user  and  machine  should  be  initiated  to  resolve  * '  ~  ambiguity,  or 
2)  a  record  of  past  performance  of  the  user  migh  -solve  the 
ambiguity,  or  3)  the  choice  selected  above  is  used  in  the  vast 
majority  of  cases  so  as  not  to  warrant  the  time-consuming  (and 
in  some  cases,  annoying)  dialogue  mentioned  above. 

If  the  object  of  the  preposition  is  itself  a  noun  phrase, 
then  the  possibility  exists  of  setting  the  COMBINE  cosnand. 
Sentences  translatable  into  the  COMBINE  command  have  a  quantifier 
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sequence  (£24)  modifying  e  zero  noun*  (A62),  as  in: 

Give  me  the  papers  indexed  by  not  more  than  3 
of  the  terms:  a,B,C,D,G. 

The  sequence  'not  more  than  3'  modifies  a  zero  noun  vhose  right 
adjunct  OlUl)  is  the  prepositional  phrase  'of  the  terras'. 


4,3.2  Adjectival  Phrases 

Certain  adjectives  indicate  the  desired  mode  of  operation. 
In  such  coses,  the  adje stive's  word  record  carries  the  informa¬ 
tion  in  the  sublist  of  COM  symbol  which  occurs  on  the  subllat 
of  the  category,  adjective  (A15).  Consider  the  record  of 


'synonymous'  below. 

■  5 

WORD 

SYNONYMOUS 

i- 

LISTIS 

(.1,A15) 

.1  LISTIS 

(.2, COM) 

4 

.2  LISTIS 

(Y) 

: 

The  Y  indicates  that  the  synonym  command  is  to  be  set.  The 
adjective  phrase  holding  the  adjective  under  consideration  will 
also  contain  the  voids  involved.  In  such  a  case,  the  Index  Term 
Lister  Package  will  then  place  the  index  terms  into  the  ' AROIMENT ' 
buffer . 


*  A  zero  noun  indicates  that  a  noun  that  does  not  necessarily 
have  to  occur  at  a  certain  point  in  the  sentence,  did  not 
occur  as  in  'Those  two  were  not  there. '  The  noun  which 
'those  two'  modifies  is  said  to  be  zeroed. 


i 


lift  I . .  ..  JU nil'1  »  i I  . .  . . . . . I . I  ... ut&  . . . 
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4.3.3  The  Relative  Clause  -  That  +  Cl  with  Noun  Omission  (C69) 

The  diversity  in  the  subject-verb-object  relationship  plic- 
able  to  the  C69  string  makes  this  string  capable  of  appearing  in 
requests  involving  all  the  various  modes  of  operation.  The 
analysis  may  be  divided  into  three  sections  depending  upon  the 
subject  of  the  Cl  string. 

a)  Subject  is  'you'  or  'there' 

Examples  Include: 

Give  me  all  that  you  have  on  radar. 

I  want  anything  that  there  is  concerning  the  field 
of  optica. 

In  such  cases,  the  object  is  the  omitted  string  (A6l).  Its 
right  adjuncts  may  then  be  treated  as  if  they  had  occurred  alone. 

b)  Subject  contains  an  index  term 

Examples  include: 

I  want  anything  that  Jones  Is  the  author  of. 

I  want  anything  that  radar  is  generic  to. 

Do  you  have  anything  that  Jones  has  written  dealing 
with  radar? 

The  object  of  the  verb  in  this  adjunct  contains  the  key  to 
its  interpretation.  The  noun  author  (which  has  the  sUbcategoTy 
BN,  bibliographic  noun,  in  its  word  record)  indicates  the  IflMBER 
command.  Generic  in  the  second  sentence  makes  the  analysis 
similar  to  that  explained  in  Section  4.3,2  except  that  the 
opposite  relation  is  required  here.  That  is,  the  sentence: 


'Give  me  all  the  words  generic  to  radar. ' 


ismeMM' m'  ms  iiNtlkUk.  W W  U  UK" VWl  M  1  MfailH  AUUIH*h«  n .  ■  »r  -1  . Hi  |.j  i|:;(|, 
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requires  the  inverse  relation  of  that  necessary  for  the  sentence: 

’Give  me  all  the  words  that  radar  ia  generic  to? ' 

Referring  to  the  third  example,  written  with  its  BVC  svibcategory 
indicates  the  NUMBER  conmand  is  involved, 
c)  Subject  is  omitted 
Examples  include: 

Give  me  everything  that  ia  generic  to  radar. 

Give  me  anything  that  has  been  written  describing  radar. 

What  do  you  have  that  has  Jones  as  the  author? 

What  words  are  there  that  begin  with  the  letters  ST7 
In  these  cases,  either  the  verb  (as  In  the  last  sentence)  or 
its  object  (as  in  the  other  sentences)  carries  the  distinguishing 
information. 

U.3.U  8manary 

It  is  to  be  noted,  that  in  the  entire  Command-Set  Generator 
analysis  those  words  indicative  of  the  system  commands  and 
syntactical  structures  carry  the  clues  to  the  interpretation.  The 
analyzer  uses  the  parse  generated  by  the  syntax  analyzer  to  deter¬ 
mine  the  environment  in  which  these  words  are  used.  Based  upon 
the  environment  found  and  the  subcategories  stored  in  the  words' 
dictionary  records,  the  conmand-set  is  formed.  In  the  commands 
associated  with  the  OSAND1  variable,  all  but  NUMBER  and  COMBINE 
would  require  the  execution  of  the  Index  Term  Lister  Package  in 
order  to  fill  the  'ARGUMENT ’  buffer  with  the  appropriate  index 
terms.  Recognition  of  a  NUMBER  or  COMBINE  comand  would  cause 
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the  aystem  to  enter  State  3  which  will  form  the  specification 
part  of  the  command,  as  will  be  explained  in  Chapter  5.' 

The  above  discussion  dealt  with  the  adjuncts  of  core  nouns  j 

indicative  of  no  specific  mode  of  operation.  However,  there  are 

| 

nouns  which  do  indicate  the  entire  conmand-set  or  only  part  of  it. 

The  N48  nouns  (indicating  DEFINE  mode)  and  the  1)49  nouns 
(indicating  SYNONYM  mode)  yield  a  quick  analysis  when  they 
occur  as  the  core  noun  as  in: 

What  is  the  meaning  of  radar? 

Give  me  some  synonyms  of  radar. 

I  want  the  definition  of  the  following  words:  j 

radar,  sonar,  and  laser.  j 

The  bibliographic  nouns  (BN*)  cause  a  CMAND2  command  to  be  j 

set.  The  BN  subcategory  of  the  wcrd  (e.g.  author,  editor)  carries  I 

its  own  sublist  indicating  the  bit  to  be  set  in  the  CMAND2  vari-  1 

able.  Once  the  consnand(s)  corresponding  to  th.se  BN  nouns  have  ; 

! 

been  set,  the  analysis  continues  as  above  to  find  other  compands  J 

that  might  be  required.  j 

I 

Consider:  s 

i 

Give  me  the  author  of  any  papers  dealing  with  radar.  j 

The  BN  noun,  author,  causes  the  AUTHOR  comnand  to  be  set. 

Analysis  would  continue  interpreting  'any  papers  dealing  with  t 

radar'  as  if  it  were  part  of  the  sentence  'I  want  any  papers 
dealing  with  radar'. 

The  various  word  categories  used  in  the  analysis  are  shown 


in  Appendix  D 


CHAPTER  5 


SPECIFICATION  FILLER  AMD  ORGANIZER 

5.1  Introduction 

If  the  command- set  generated  by  either  the  Ultimate  Object 
Analysis  or  the  Pragmatic  Analysis  includes  the  NUMBER  or  COMBINE 
coanand,  then  the  Specification  Filler  must  be  executed  to  estab¬ 
lish  their  specification  part  before  the  final  output  conmands 
can  be  formed.  This  specification  part  'which  includes  the  associa¬ 
tion  of  a  sector  designator  with  each  index  term  and  the  formation 
of  the  tallied  logical  construction  of  the  request  by  the  proper 
placement  of  parenthesis  for  grouping  and  of  the  logical  symbols 
A,  +,  t,  is  performed  by  the  system  in  State  3  or  the  Specification 
Filler  State.  After  the  coiqpletion  of  State  3*  the  Organiser 
(State  U)  forms  the  various  consands  together  with  their  specifica¬ 
tion  part  in  the  output  buffer  in  tbe  proper  sequence. 

5.2  Specification  Filler 

Before  execution  of  State  3,  the  previously  active  state  has 
determined  the  proper  starting  node  (connand- formatter  node)  for 
the  specification  analysis.  In  some  cases,  part  of  the  analysis 
has  been  made  (as  the  example  on  page  77),  and  the  results  placed 
in  tbe  ’ARGUMENT'  buffer. 

The  Specification  Filler  analysis  includes: 

l)  formation  of  a  sequence  of  codes  representing  the 

significant  words  of  the  informational  clauses  of  the 
request  starting  from  the  ce^sand- formatter  node. 


-  90  - 


91. 


2)  the  manipulation  of  this  sequence  in  order  to  associate 
each  index  terra  vLth  a  code  representing  the  appropriate 
sector  designator.  Also,  as  is  required  by  the  syntax 
rules  of  the  commands,  all  index  terras  must  sequentially 
follow  its  associated  sector  designation  code.  In 
addition,  multi- vrord  conjunction  (e.g.  and  either) 
codes  are  replaced  by  one  repres  anting  the  collective 
action  of  the  conjunction. 

3)  the  logical  construction  implied  by  the  original  request 
is  maintained.  Any  ambiguity  inherent  in  the  user's 
message  is  resolved  on  the  basis  of  a  hierarchy  scheme 
for  conjunctions.  All  codes  representing  parenthesis, 
logical  symbols,  and  sector  designators  are  replaced  by 
their  actual  representation  as  required  by  the  command's 
syntax. 

5.2.1  SEM-Value  Extractor 

Code  numbers  for  all  the  significant  words  that  occur  in  the 
informational  clauses  of  a  NUMBER  or  COMBINE  command  are  stored 
in  tha+  word's  dictionary  record  in  the  sublist  of  the  SEM  category 
which  is  itself  found  on  the  suolist  of  the  category  chosen  for 
the  w>rd.  Some  words  (e.g.  written)  have  more  than  one  code 
number  (or  SEM- value)  indicating  that  the  proper  value  to  be 
used  depends  upon  the  context  in  which  this  word  is  used.  Also 
some  words  (e.g.  the)  have  no  SEM  sublist  at  all  indicating  that 
their  presence  in  the  string  (although  necessary  for  syntactical 
purposes),  reveals  no  information  useful  for  conmand  formation. 
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Words  Indicative  of  a  sector  designator  that  can  occur  both 
before  and  after  its  associated  index  terms  have  a  multi-valued 
8 EM  sublist.  The  ward  'written'  which  may  occur  before  an  index 
term  as  in  'written  on  radar'  or  after  an  index  term  as  in  'that 
Jones  has  written'  has  a  SEM  value  associated  with  each  case. 

It  is  the  purpose  of  the  SEM-Value  Extractor  to  resolve  all 
ambiguities  through  the  tree  produced  by  the  syntax  analyzer. 

At  the  conclusion  of  the  SEM- Value  Extractor,  the  'ARGUMENT' 
buffer  contains  the  SEM- value  of  all  wards  occurring  in  the 
informational  clauses  of  the  request. 


5.2.2  Associating  Mechanism 

The  Associating  Mechanism  associates  the  various  index 
terms,  as  represented  in  the  'ARGUMENT'  buffer,  with  the  proper 
sector  designator  code  and  does  so  ensuring  that  the  index  terms 
follow  their  sector  code  in  the  'ARGUMENT'  buffe*. 

Some  examples  follow  to  help  bring  out  the  methods  used. 
Throughout  these  examples,  the  following  SEM  values  were  used. 


WORD 
WRITTEN 
EDITED 
1  I/I8HED 
SMITH 
JONES 


81M  VALUE 
1 

2 

3 

200  followed  by  Smith 
200  followed  by  Jones 


i 


51 


OR 


53 


93. 


BY  22 

EITHER  55 

1967  201  followed  by  1967 

THAI  82 

PERIOD  99 

Ilf  20 

Example  1:  Give  me  anything  written,  edited  or  published  by 
either  Smith  or  Jones,. 

The  Ultimate  Object  Analysis  establishes  the  ultimate  object 
as  being  'anything  . . . ' .  The  Pragmatic  Analysis  causes  the 
NUMBER  command  to  be  set  by  virtue  of  the  respective  values  of 
OBTYP,  PREVB,  TDICT  and  the  fact  that  the  index  terms  do  not 
represent  a  date,  as  explained  in  8ection  4.3.1.  As  a  result  of 
the  8EM- Value  Extractor,  the  ' ARGUMENT '  buffer  contain*  the 
sequence : 

1  51  2  53  3  22  56  200  Smith 

53  200  Jones 

Because  'written'  occurs  preceding  its  index  terms  (a.l.b.  l)* 
there  must  be  an  associated  preposition.  The  next  element  in  the 
sequence  being  a  conjunction^  instead  of  the  preposition  indicates 
that  a  sequence  of  BVC  words  is  present.  The  system  will  now 

associate  the  preposition  'by'  (a.i.b.  22)  and  its  following  index 

*  a.l.b.  is  an  abbreviation  for  'as  indicated  by  the'. 

T  All  codes  51-70  indicate  a  conjunction  and  20-49  indicate 
a  preposition. 
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term  sequence  -  55  200  Smith  53  200  Jones  -  with  each  of 

the  BVC  words  written,  edited  (a.i.b.  2)  and  published  (a.i.b.  3)» 
The  code  representing  the  sector  designator  for  'written  by'  is 
gotten  from  the  sublist  of  the  SIM  value  of  written.  Referring 
to  the  word  record  of  written  (Figure  19),  this  sublist  contains 
the  code  for  the  preposition  involved,  viz,  22*.  This  22  has  a  2 
on  its  sublist.  The  2  represents  the  sector  designator  (in  this 
case,  AUTHOR).  If  the  22  had  no  sublist,  as  is  the  case  of  20 
(in),  the  system  must  make  a  further  study  of  the  sequence  to 
determine  the  sector  designator. 


WORD 

WRITTEN 

LIST  IS 

(-1.A32) 

1 

LlSTIS 

( .U,SC,.2,P0,.3,B0,  ,15,SEM,BVC,  (Al),BV) 

15 

LISTIS 

(.31,N1,(N2),N101) 

31 

LISTIS 

( ( N2 ) , N22 , N2 0 , N2 1 , ( N1 4 ) , N2 4 , ( NlU ) , N2 5 , 

LISTIS 

N26 , ( N12 ) , N27 , ( Nil ) , N28 , N29 , N30 , 

LISTIS 

a 

(Nil)  ,N31,  (N12) ,lf32,  (R15) ,N33) 

* 

END 

WRITTEN 

Figure  19 


Therefore  the  resulting  'ARGUMENT'  buffer  is: 


*  N22  actually  appears.  The  N  is  necessary  for  program 

considerations,  but  the  actual  list  will  have  22. 


2 

55 

Smith 

53 

Jones 

51 

4  55 

Smith 

53 

Jones 

53 

5 

55 

Smith 

53 

Jones 

The  200  code  indicating  a  non-data  index  tars  has  been 
eliminated.  The  numbers  (1-15)  above  indicate  sector  designators 
and  are  no  longer  SIM  values.  The  above  sequence  la  used  to  form 
the  proper  logical  constructions  by  the  placement  of  parenthesis 
and  then  the  proper  syntactical  symbols  replace  all  codes  as  will 
be  discussed  in  Section  5.2.3* 

Example  2:  What  has  Jones  written  that  was  published  in  1967? 

As  a  result  of  the  analysis  carried  out  in  States  1  and  2f 
the  coamand- formatter  node  ie  'that  . and  the  'ARGUMENT* 
buffer  before  execution  of  State  3  contains: 

200  JONES  101 

As  a  result  of  the  SEM- Value  Extractor,  the  'ARGUMENT '  buffer 
contains : 

200  JONES  101  82  3  20  201  I967 

Because  written  occurs  following  its  index  term  (a.i.b.  101), 
the  sector  designation  code  must  be  placed  into  a  position  preceding 
the  index  term.  The  sublist  of  written's  SIM  value  indicates  the 
sector  designation  code.  As  seen  from  figure  19,  Its  value  is  2. 
Therefore  at  the  conclusion  of  the  analysis  of  the  first  informa¬ 
tional  clause,  the  'ARGUMENT '  buffer  contains: 
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2  Jones  3  20  201  1967 

The  82  (representing  that)  is  used  to  analyze  informational 
clauses  similar  to:  ’that  has  Jones  as  the  author',  i.e.  in 
cases  where  the  sector  designator  is  represented  by  a  noun 
instead  of  a  verb  as  in  this  case.  Therefore  in  this  case 
the  82  is  ignored. 

The  3  (representing  published)  is  treated  similarly  to  that 
of  written  in  Example  1,  i.e.  the  code  for  the  following  preposi¬ 
tion  is  looked  up  in  the  sublist  of  the  3  in  the  word  record  of 
published  (see  Figure  20). 


WORD 

PUBLISHED 

LISTIS 

(.1.A32) 

.1 

LISTIS 

( .IbjSEMjBVC, .5 ,SC, (Aft ) , BV, ,2, BO, .3,P0) 

.15 

LISTIS 

(.31,N3,(N5),N103) 

-31 

LISTIS 

( N20 , N21 , ( Nb ) ,N22 , ( iJl4 ) ,N24, ( NxU ) , N25 , 

LISTIS 

N26,  (102)  ,N27,  (Nil)  ,328,1*29,1130,  (Nil) , 

LISTIS 

N3I,(N12),N32,(N15),If33) 

END  PUBLISHED 


Figure  20 


The  aublist  in  question  is  the  .31  list.  The  preposition  to 
be  found  is  20  which  can  be  seen  to  have  no  eublist.  This 
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indicates  that  further  analysis  is  needed  to  determine  the 
sector  designator  code.  The  system  uses  the  remaining  portion  of 
the  'ARGUMENT'  to  distinguish  between  structures  like: 


:ilU ,  .1.  ■  l >'Vn.  •  A  „  mii-‘  »..  .  1  ::V.'  ••  1  ■■ 
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1)  published  in  1967 

2)  published  in  the  ACM 

3)  published  in  the  period  1967  to  1961 

4)  published  in  1967-63 

5)  published  in  the  1960's 

The  'ARGUMENT '  buffer  in  each  of  these  cases  vould  be, 
respectively: 


1’) 

3 

20 

201 

1967 

2’) 

3 

20 

200 

ACM 

3') 

3 

20 

99 

201 

V) 

3 

20 

204 

1957-63 

5') 

3 

20 

202 

1950’s 

It  can  be  seen  that  each  case  has  its  own  distinguishing 
features  which  are  used  to  determine  the  appropriate  sector 
designation  code.  These  five  clauses  represent  respectively 
papers  published  in  the  single  year  1967,  papers  appearing  in  the 
ACM  publication,  papers  published  in  any  year  between  1967  and 
1961,  papers  published  in  any  year  between  1967  and  1963  but 
expressed  as  an  hyphenated  date,  and  papers  published  in  the 
decade  starting  at  1950.  The  sector  desigpator  codes  applicable 
in  these  cases  are  respectively  9  (indicating  exact  date), 

8  (indicating  Journal  publication),  l4  (indicating  interval  of 
dates  given  the  two  end  points),  10  (indicating  hyphenated  dates), 
13  (indicating  a  decade  of  dates). 

Therefore  returning  to  the  example  at  hand,  the  'ARGIKETTT ' 


buffer  would  be: 
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2  JONES  9  1967 

Every  informational  clause  has  been  reduced  to  its  index 
term  sequence  preceded  by  the  appropriate  sector  code  designator. 
A  complete  list  of  the  sector  code  designators  follovs: 


CODE 

SECTOR  DESIGNATOR 

2 

AUTHOR 

3 

TITLE 

4 

EDITOR 

5 

PUBLISHER 

6 

DESCRIPTOR/ABSTRAC' 

8 

JOURNAL  OCCURRENCE 

9 

DATE  -  EXACT 

10 

DATE  -  HYPHENATED 

11 

DATE  -  MINIMUM 

12 

DATE  -  MAXIMUM 

13 

DATE  -  DECADE 

14 

DATE  -  INTERVAL 

The  complete  lint  of  3EM  values  appears  in  Appendix  E. 

^.2.3  Logical  Maintenance 

This  step  in  the  analysis  performs  the  following  functions: 
l)  lists  all  the  required  dates  explicitly  in  the 

’ARGUMENT'  buffer  in  the  cases  in  which  the  sector 
designator  code  is  between  10  and  l4  inclusive. 
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2)  places  parenthesis  codes  in  the  buffer  to  maintain  the 
Implied  logical  construction. 

As  an  example,  consider  'I  want  anything  written  after  1966 
dealing  with  either  nylon,  rayon  and  dacron  or  wool,  but  not 
published  by  Stevens  McGill. '  Following  the  procedures  outlined 
previously,  the  'ARGUMENT'  buffer  as  a  result  of  the  SEM- Value 
Mechanism  vould  be: 


1 

28 

201 

1966 

71 

36 

55 

200 

NYLON 

51 

200 

RAYON 

52 

200 

DACRON 

53 

200 

WOOL 

51 

54 

60 

3 

22 

200 

STEVENS 

MCGILL 

As  a  result  of  the  Associating  Mechanism,  the  'ARGUMENT'  buffer 

wo 'Hd  be. 


j.1 

1966 

6 

55 

NYLON 

51 

RAYON 

LAC RON 

53 

WOOL 

65 

5 

STEVENS 

MCGILL 

The  11  indicates  that  1966  is  the  minimum  year  desired  so 
that  1967,  I968,  and  1969  will  be  put  into  the  final  command 
along  with  1966  all  Joined  by  the  logical  or  (+).  The  6  indicates 
that  the  following  index  term  sequence  refers  to  descriptors,  and 
the  5  indicates  that  Stevens  McGill  is  a  publisher. 

All  the  logical  and  control  symbols  used  at  the  stage  in 
the  analysis  are  represented  by  codes  as  follows: 


( 

-  no 

+  . 

ll4 

) 

-  111 

&  - 

115 

[ 

-  112 

v  - 

116 

] 

-  U3 

/  - 

117 

*AAl. 


9 

1966 

114  1967 

114 

1968 

114 

1969 

6 

NYLON 

51 

RAYON 

52 

DACRON 

53 

WOOL 

65 

STEVENS  MCGILL 

The  placement  of  logical  symbols  must  be  made  to  maintain 
the  implied  logic  both  between  informational  clauses  end  within 
any  given  clause.  The  conjunctions  are  divided  into  two  groups  - 
those  conjunctions  preceded  by  a  comma  and  those  not  preceded  by 
a  comma  are  respectively  Group  1  and  Group  2,  Any  Group  1 
conjunction  has  a  higher  priority  (i.e  will  be  considered  first 
in  this  analysis)  than  any  Group  2  conjunction.  Within  any  group 
the  order  of  decreasing  priority  is: 
as  well  as 
both  ...  and  ... 
not  only  ...  but  also 
but  neither  /  and  neither 
and  either 
or  either 
either 
and  also 

but  not  /  and  not 

but 

not 

nor 

or 

and 
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If  two  connective!  have  the  iiu  priority  they  ere  operated 
upon  u  they  appear  in  the  sentence  reading  from  left  to  right. 

Ubder  this  scheme,  the  connective  ' ,  but  not'  is  taken 
first.  As  with  every  connective,  it  must  be  determined  whether 
the  connective  Joins  two  informational  clauses  or  is  within  a 
single  informational  clause.  In  this  case,  the  former  is  true. 
Therefore,  the  'ARGIMEHT'  buffer  contains: 


9 

1966 

114 

1967  n4 

1968 

114 

1969 

6 

55 

NYLON 

51 

RAYON 

52 

DACRON 

53 

WOOL 

111 

116 

no 

5 

STEVENS 

MCGILL 

The  matching  parentheses  are  placed  adjacent  to  a  previously 
placed  parenthesis,  if  any  exists,  or  else  at  the  beginning 
end  end  of  the  sequence.  Therefore,  the  buffer  contains: 

no  9  1966  u4  1967  n4  1968.  n4  1969  6  ss 

mow  51  RAYON  52  DACRON  53  WOOL  111  116 
110  5  STEVENS  MCGILL  111 

The  comma  which  is  by  itself  is  the  next  connective  to  be 
considered.  Since  its  environment  implies  an  and  construction, 
this  comma  is  treated  as  such  resulting  in: 


no 

9 

1966 

U4 

1967 

n4 

1968 

114 

1969 

NYLON 

135 

RAYON 

52 

DACRON 

53 

WOOL 

m 

116 

HO  5  STEVENS  MCGILL  1U 

3 

The  next  connective,  'either  ...  or',  is  within  an  informa¬ 
tional  clause  and  as  such  uses  brackets  for  purposes  of  grouping 
resulting  in: 
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no  9  1966  nk  1967  11U  1968  nU  1969  6  112 

NYLON  115  RAYON  52  DACRON  113  114  112  WOOL  113  111  lift 

no  5  STEVENS  MCGILL  in 

After  the  connectives  are  operated  upon,  the  'ARGUMENT'  buffer 

contain*  the  following  [the  codes  representing  the  logical  and 
control  symbols  are  atm  present  but  their  actual  symbol  is  used 
below] : 

(  9  1966  +  1967  +  1968  +  1969  6  [  [ 

NYLON  A  RAYON  ]  A  [  DACRON  ]  ]  +  [  WOOL  ] 

)  t  (  5  STEVENS  MCGILL  ) 

It  should  be  noticed  that  the  informational  clauses  ’after 
1966'  and  'dealing  with  ...'  are  not  Joined  by  a  connective  ir. 
violation  of  the  syntax  rules.  Therefore  such  a  situation  will 
be  treated  as  an  'and'  connective  between  informational  clauses. 


resulting  in: 

i 

(  9 

1966 

+ 

1967 

+ 

1968 

*r 

1969 

)  *  (  6  r  [ 

NYLON  & 

RAYON 

3 

a 

c 

DACRON 

n 

j 

3 

v  C  WOOL  j 

•  1 

>  T 

( 

5 

STEVENS 

MCGILL 

\ 

J 

The  matching  parenthesis  of  the  Just  treated  'and'  are 
placed  adjacent  to  the  closest  left  and  right  parenthesis  from 
this  'and'  as  seen  below.  Note  tha  if  there  were  no  distinction 
betveen  parenthesis  and  brackets,  confusion  would  result.  The 
final  'ARGUMENT'  buffer  is: 


•■‘'Ilf'1  ■'  k  i-'iH  (-1  ■ :  l«i  NlUllilM  I-  111  .&  '|b>}  ill|i1  .lilik:^  i  n:|.  [i.. Srijilfr  | 
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(( 

9 

1966 

+ 

1967  «■  1968  ♦ 

1969  ) 

*  (  6  c  c 

NYLON 

A 

RAYON 

3 

A  [  DACRON,  ] 

3  ♦ 

[  WOOL  3 

)) 

t 

( 

5 

STEVENS  MCGILL  ) 

At  this  point,  all  logical  and  control  codes  and  sector 
designation  codes  are  replaced  by  their  actual  representation 
yielding  a  final  specification  part: 

(  ■  DATE  1966  +  1967  +  1968  +  1969  )  A  (  DESC 

(  (  NYLON  A  RAYON  )  A  (  DACRON  )  )  +  (  WOOL  )  )  )  t  (PUBL 

JTEVENS  MCGILL  ) 

It  should  be  noted  that  the  above  connective  mechanism  is 
limited  a a  to  the  occurrence  of  a  higher  priority  connective 
within  the  scope  of  a  lover  priority  scope-marked  connective, 
ih&t  in,  in  the  above  example  if  it  had  been:  'I  vant  anything 
written  after  1965  dealing  with  either  nylon,  rayon,  and  dacron 
,-r  wool,  but  not  published  by  Stevens  McGill.',  the  comma  following 
'  rayon*  would  cause  the  connective  and'  to  be  executed  before 
cither... or*  thereby  causing  an  incorrect  grouping. 

6.3  Organizer 

The  function  of  the  Organizer  Is  to  fora  the  output  buffer 
with  the  selected  commands  and  their  associated  specification 
parts.  The  commands  are  placed  in  the  order  necessary  to  perform 
the  request.  For  example,  if  the  request  is: 

'Give  ae  the  author  of  anything  on  optical  scanning. ' 
then  the  output  buffer  would  contain: 


E 

NUMBER  DESC  OPTICAL  SCANNING  **  AITTH  *»  0 

M 

The  NUMBER  command  will  form  a  list  of  document  numbers  each 
of  which  has  been  indexed  by  'optical  scanning'  as  subject  matter. 
The  AITTH  compand  will  then  give  the  user  the  author  of  each 
doc  went  in  the  list  formed  by  the  NUMBER  command. 
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CONCLUSIONS  AND  FUTURE  RESEARCH  SUGGESTIONS 

6.1  General  Conclusions 

Real  English  has  been  designed  for  use  in  the  information 
retrieval  system  of  the  Moore  School  Information  Systems  Labora¬ 
tory.  It  is  prograraned  on  the  RCA  Spectra  70/1*6  entirely  in  the 
FORTRAN  TV  language.  At  present,  it  is  a  stand  alone  package  and 
as  such  inputs  its  user  messages  through  a  card  reader  and  outputs 
the  translated  Symbolic  Command  Language  consnanda  to  the  line 
printer.  Described  below  are  sample  dialogues  based  upon  the 
indicated  messages  to  illustrate  the  various  capabilities  of  the 
system.  In  the  accompanying  figures,  the  system  responses  are 
indicated  by  an  asterisk  (*)  at  the  left  margin^.  The  translated 
Symbolic  Command  Language  commands  are  shown  in  each  case  and  are 
indicated  by  a  slash  (/)  at  the  left  margin. 

Incorporation  of  the  Real  English  package  into  an  on-line 
information  retrieval  system  which  has  taken  into  account  the 
linguistic  style  of  its  users  and  the  complete  set  of  Bystem 
commands  will  enhance  this  system's  natural  language  man-machine 
conversational  capabilities.  The  user  will  be  free  to  use 
messages  whose  pragmatic  interpretation  is  dependent  upon  the 
previous  dialogues.  Furthermore,  the  system  will  be  able  to 
recognize  messages  that  are  not,  strictly  speaking,  sentences. 

T  The  system  responses  are  actually  taken  from  the  present  MSISL 
information  retrieval  system  for  which  Real  English  was 
designed.  They  are  included  to  produce  typical  dialogues  based 
upon  the  supplied  user  messages. 
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In  addition,  multi-node  access  affords  t)*>  -  ser  a  vider  range  of 
search  strategy.  Illustrations  of  these  features  axe  presented 
in  Figures  21-23. 

The  search  strategy  portrayed  in  Figure  21  reveals  that  the 
user  wishes  to  do  a  combinatorial  search  baaed  upon  more  than 
tw  of  his  index  terms.  The  Document  List  formed  as  a  result  of 
the  execution  of  the  translated  COMBINE  comnand  is  used  to 
extract  the  titles  requested  in  the  first  message  and  the  authors 
requested  in  the  second  message.  Because  the  initial  request 
explicitly  referred  to  the  information  sectors  desired,  no  computer 
Initiated  and  directed  dialogue  occurred.  The  last  message  which 
may  be  classified  as  conversationally  dependent  uses  information 
derived  from  the  previous  message  in  order  to  associate  Heilman 
with  the  sector  designator  At/TH  or  author. 

The  first  message  of  Figure  22  illustrates  the  system's 
ability  to  associate  the  preposition  b£  with  each  of  the  past 
participles  and  the  index  term  Johns  with  each  informational 
clause,  Since  no  particular  informational  sectors  were  indicated 
in  the  request  the  system  enters  into  a  system  initiated  and 
directed  dialogue  to  solicit  this  information.  Note  that  before 
the  user  returns  to  hie  initial  line  of  questionir  n  has  made 
reference  to  both  the  DEFINE  and  RELATION  files  based  upon  infor¬ 
mation  revealed  to  him  in  the  SEARCH  mode  of  operation.  This 
multi-mode  operation  is  a  key  feature  of  Real  English. 

The  dialogue  dependency  feature  of  Real  English  is  further 
illustrated  in  Figure  23.  In  this  case,  users  A  and  B  have 
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PLEASE  GIVE  ME  THE  TITLE  OF  DOCUMENTS  ON  MORE  THAN 
WO  OF  THE  FOLLOWING  AREAS  HARMONIC  ANALYSIS, 

NONLINEAR  CONTROL,  FEEDBACK",  DYNAMIC  COUPLING. 

COMBINE  (G2)  DESC  HARMONIC  ANALYSIS  /  NONLINEAR 
CONTROL  /  FEEDBACK  /  DYNAMIC  COUPLING  ** 

TITL  *+ 

£6  L0CIMENT3  HAVE  BEEN  SELECTED. 

19  DOCUMENTS  INDEXED  BY  EXACTLY  2  OF  THE  INDEX  TERMS 

7  DOCUMENTS  INDEXED  BY  EXACTLY  3  OF  THE  INDEX  TERMS 

0  DOCUMENTS  INDEXED  BY  EXACTLY  4  OF  THE  INDEX  TERMS 

ACCESSION  NO.  112 

Vi  V  HARf'ONTC  OSCILLATIONS  OF  NONLINEAR  SYSTEMS 

ACCESS  ION  ?Jf  .  253 
.  ifL;  NOW  (NEAR  FEEDBACK 

ACCESS  i.Ou  I«0.  1713 

TJ.TL:  HARMONIC  .ANALYSIS  OF  STRINGS 

WHO  WROTE  THEM? 

AUTH  ** 


!  V"; 


Dialogic*  * 


ACCESSION  NO.  112 


AUTH:  M.  KLOPTER,  P.  HEILMAN 

ACCESSION  NO.  253 
AUTH:  J.  ARNOLDSEN 

ACCESSION  NO.  1713 
AUTH:  A.  B.  STEVENS 

WHAT  ELSE  DID  KLOPTER  WRITE  ABOUT  HARMONIC  ANALYSIS? 
NUMBER  AUTH  KLOPTER  &  DESC  HARMONIC  ANALYSIS  ** 

NO  REFERENCES  HAVE  PEEN  FOUND. 

HOW  ABOUT  HEILMAN  OR  STEVENS  ON  FEEDBACK  NETWORKS? 

NUMBER  AUTH  (  HEILMAN  +  STEVENS  )  A  DESC  FEEDBACK 
NETWORKS  ** 


NO  REFERENCES  HAVE  BEEN  FOUND 
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DO  YOU  HAVE  ANYTHING  ABOUT  READING  WRITTEN,  EDITED 
OR  PUBLISHED  BY  JOHNS? 

/  NUMBER  (  DESC  READING  )  A  (  AUTH  JOfflB  +  EDIT  JQHNB 

/  +  ISSR  JOHNS  )  ** 

* 

*  26  DOCUMENTS  HAVE  BEEN  SELECTED 

*  DO  YOU  WANT  TO  SEE  ANY? 

YES 

*  INDICATE  DESIRED  INFORMATION  (AUTHOR,  TITLE,  ETC.) 

TITLE 

*  ACCESSION  NO.  Ill 

*  TITL:  LANGUAGE  EXPERIENCE  IN  THE  TEACHING  OF  READING 

* 

*  ACCESSION  NO.  473 

*  TITL:  READING  THROUGH  PHONICS 

* 

*  ACCESSION  NO.  506 

*  TITL:  READING  DIFFICULTIES  DUE  TO  EYE  DEFECTS 

WHAT  IS  THE  LANGUAGE  EXPERIENCE? 

/  DEFINE  IANGUAGE  EXPERIENCE  ** 

*  IANGUAGE  EXPERIENCE  - 

«  AN  APPROACH  TO  READING  THAT  18  BASED  UPON 

*  PUPIL-MADE  MATERIAL  ** 


Figure  22  -  Dialogue  2 


GIVE  ME  SOME  TERMS  THAT  ARE  SPECI?IC  TO  EYE  DEFECTS. 
RELATION  (7)  EYE  DEFECTS 

EYE  DEFECTS  - 

SPECIFIC  TERMS:  MYOPIA,  HYPEROPIA,  ASTIGMATISM, 
CATARACTS 

WHAT  DOES  HYPEROPIA  MEAN? 

DEFINE  HYPEROPIA  ** 

A  CONDITION  IN  WHICH  VISUAL  IMAGES  COME  TO  A 
FOCUS  BEHIND  THE  RETINA  OF  THE  EYE. 

WHAT  DOCUMENTS  DO  YOU  HAVE  IN  THE  FIE  ID  OF  READING 
DISABILITIES  WRITTEN  IN  EITHER  1967  OR  1968? 

NIMBER  (  DESC  READING  DISABILITIES  )  A  (  DATE  I967  + 

1968  )  ** 
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User  A 


WHAT  T>XIMEOTS  do  YOU  HAVE  CONCEPHIRQ  REFLECTION? 
/  NUMBER  DESC  REFLECTION  *+ 

HOW  ABOUT  REFRACTION? 

/  NIWBER  DESC  REFRACTION  ** 

DIFFRACTION . 

/  NIWBER  DESC  DIFFRACTION  ** 


Ufler  B 


GIVE  ME  THE  DEFINITION  OF  REFLECTION. 
/  DEFINE  REFLECTION  ** 

HOW  ABOUT  REFRACTION? 

/  DEFINE  REFRACTION  *+ 

DIFFRACTION 

/  DEFINE  DIFFRACTION  ** 


Figure  2 3  -  Dialogue  3 


identical  follow-up  requests.  However  their  pragmatic  interpreta¬ 
tions  are  dependent  upon  the  previous  dialogue  and  thus  user  A 
continues  to  receive  information  based  upon  the  NUMBER  command 
whereas  user  B  receives  information  based  upon  the  DEFINE  command. 

The  strong  linguistic  basis  inherent  in  Real  English  is  due 
to  its  syntactical  grammar.  This  grammar  is  powerful  enough  to 
accept  a  wide  range  of  syntactical  structures  and  yet  flexible 
to  be  changed  according  to  future  developments.  An  example  of 
the  logical  construction  permitted  by  Real  English  is  illustrated 
in  Figure  2k.  Figure  25  shows  several  different  messages  which 
would  be  translated  into  the  same  command  to  extract  the  documents 
written  by  Jones  and  at  the  same  time  whose  subject  area  is  radar.. 
Such  diversity  in  the  structure  of  user  messages  demonstrates 
the  versatility  of  the  Real  English  system. 

For  additional  translated  user  messages,  refer  to  Appendix  F. 

6 . F  Future  Research  Goals 

Future  research  associated  with  this  dissertation  should 
encompass  the  areas  of  grammar  evaluation,  influence  of  a 
computer- initiated  and  directed  dialogue  on  system  performance, 
and  pragmatic  ambiguity  resolvers. 

To  have  a  truly  useful  information  retrieval  system  with  a 
natural  language  man-machine  interface,  the  grammar  comprising 
the  acceptable  syntactical  structures  must  be  shown  to  handle 
the  linguistic  style  of  its  users.  Using  an  actual  information 
ret  rieval  system,  experiments  snouid  be  conducted  along  these 
lines.  Messages  which  cannot  be  properly  parsed  should  be 
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j.  WANT  THE  AUTHOR,  DAT!  AND  TITLE  OF  ALL  MATERIAL  DEALING  WITH 
THE  AREA  OF  COSMIC  RADIATION  WRITTEN  BY  SCHWARTZ  OR  ALLEN  BUT 
NOT  R0B8EH  AFTER  1966. 

NUMBER  (  DESC  COSMIC  RADIATION  )  4  ((  AUTH  (  SCHWARTZ  + 

ALLEN  )  t  (  ROBSEN  ) )  4  (  DATE  1966  +  1967  ♦  I960 
+  1969  ))  **  AUTH  **  DATE  •*  TITL  ** 

c:  VE  ME  ANYTHING  WRITTEN  IN  THE  1960's  ON  BOOLEAN  ALGEBRA. 

NUMBER  (  DATE  1950  +  1961  +  1962  +  1963  +  1964  +  1965  + 

1956  +  1967  v  1968  +  1969  )  *  (  DESC  BOOLEAN  ALGEBRA  )  ** 


Figure  24  -  Logical  Complexity 
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WHAT  DID  JONES  WRITE  ABOUT  RADAR 7 

WHAT  HAS  BEEN  WRITTEN  BY  JONWS  ABOUT  RADAR? 

OIVE  ME  SOMETHING  Iff  THE  AREA  OF  RADAR  BY  JONES. 

DO  YOU  HAVE  ANYTHING  ON  RADAR  AUTHORED  BY  JONES? 

WHAT  HAS  JONES  WRITTEN  ABOUT  RADAR I 
PAPERS  BY  JONES  ON  RADAR. 

AUTHORED  BY  JONES  ABOUT  RADAR. 

I  WOULD  LIKE  MATERIAL  ON  RADAR  BY  JONES. 

LIST  THE  PAPERS  BY  JONES  THAT  DEAL  WITH  RADAR. 

WHAT  DO  YOU  HAVE  ON  RADAR  WRITTEN  BY  JONES? 

COULD  I  HAVE  MATERIAL  ABOUT  RADAR  THAT  WAS  WRITTEN  BY  JONES? 


f  igure 


Diversity  of'  1 . o  Strati. urf)e 
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collected  and  de-jnionp  affecting  their  inclusion  into  the  grammar 
.-.liould  be  made  based  upon  their  frequency  of  occurrence  and 
relative  importance  to  the  total  retrieval  service. 

The  computer  initiated  and  directed  dialogue  may  have  two 
major  effects  on  system  operation:  l)  it  may  affect  the  linguistic 
style  of  the  users,  and  2)  it  may  have  psychological  effects  that 
may  be  detrimental,  to  the  mental  attitude  of  the  user.  This 
dialogue  may  be  useful  in  overall  system  performance  by  leading 
the  user  to  his  next  request  and  thereby  detouring  him  from  a 
line  of  questioning  which  the  pragmatic  interpreter  is  not  yet 
prepared  to  handle.  For  example,  if  the  user  has  received  the 
nvrrbcr  of  documents  satisfying  his  initial  SEARCH  mode  query  and 
is  unaware  of  the  particular  information  available  to  him,  his 
second  request  may  oe  something  like  •  'What  do  I  do  now?'  or 
'What  next? ' .  A  dialogue  initiated  immediately  after  informing 
Mm  of  the  number  of  documents  might  lead  him  to  discovering 
system  capabilities  and  also  avoid  the  above  response  which  the 
system  may  not  be  able  to  handle.  On  the  other  hand,  constant 
interruptions  by  the  system  might  annoy  the  user  and  so  act  in  a 
detrimental  manner  toward  system  performance.  Experiments  should 
be  performed  to  achieve  the  proper  balance. 

With  the  addition  of  more  and  more  modes  of  operation,  the 
possibility  of  pragmatic  ambiguity  increases.  Consider  a  one 
mode  system  having  the  SEARCH  mode.  A  query  might  be:  What  do 
you  have  related  to  radar?  which  would  be  translated  into  the 
number  of  documents  whose  subject  matter  is  radar.  Al6o  consider 


a  one  mode  system  whose  mode  is  the  REIjATION  mode. 


The  same 


query  vould  tie  translated  into  a  series  of  words  or  phrase- 
associated  with  radar  through  one  or  more  relationships.  If  a 
system  had  both  of  these  modes,  an  ambiguity  would  occur  which 
might  be  resolved  by  a  man-machine  dialogue  or  by  &  yrofile  chart 
of  the  user.  Such  a  chart  might  include,  for  example,  the  user's 
propensity  to  use  certain  words  or  phrases  when  referring  to 
particular  system  oxides.  In  this  way  the  system  could  use  past 
experiences  of  the  user  to  resolve  ambiguities. 
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SENTENCES  FROM  THE  ORAL  AND  WRITTEN  EXPERIMENTS 

I  would  like  to  have  a  li3t  of  magazine  articles  on  thin  films . 

I  would  like  a  list  of  references  on  the  following  subjects: 
computer  memory  elements,  computer  stores,  thin  films,  computer 
memory  design. 

Give  me  anything  else  that  J.  r.  Brown  wrote. 

Give  me  anything  Williams  has  written. 

List  all  the  books  which  contain  information  about  computer 
memories  written  by  D.  Simon. 

Supply  list  of  books  on  computer  design. 

Do  you  have  anything  on  thin  film  reliability? 

What  are  the  other  books  written  by  Williams? 

Did  Jenkins  write  any  other  books? 

Are  there  any  other  books  by  Kimon? 

What  about  thin  film  physics? 

Can  you  find  anything  on  thin  film  manufacturing  techniques? 

Give  me  a  list  of  references  by  the  following  authors:  D.  Simon, 
W.  Evans,  S.  T.  Jenkins,  L.  Williams. 

Supply  list  of  books  by  D.  Simon. 

Let  me  see  the  references  for  computer  memory  design, 
is  there  anything  under  memory  design  logic? 

I  would  like  t'ne  titles  of  books  written  by  R.  Gray. 
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AESTOd  >7,1.  b  •  I'.'ATTO'i 


he  t'-rfr,  •  ir/.n**pi ut. i  ve  nvogrida’  if  eypnr. icxl  below: 

.  r*i  rsl  -level  res Dense : 

'/in  ::*r-,r»;,r^r-.vs  imicjirwa  in  u  counter  preg-'nr.  t»-n- 

oTrhi.nrr  Li  mslatior  and  exeout.icr, . 

.  Second- ] e ve i  t espouse * 

"An  tirrKFi  s:;rTTVE  J’RGGEAM  i  s  a  computer  program  vlUch 
receives  a  aequcace  of  commands  in  a  source  liir^nw. 
examines  each  command,  determines  a  translation  to 
replace  It  in  the  object  language,  end  execute c  i.r  p 
possible.  The  major  characteristic  of  an  INTERPRL l  l‘‘V 
r’ROCRA.M  «•;  that  the  translation  of  c. n  instruction  Jr 
performed  •”'.■•0  time  the  instruction  in  to  be  obeyed.' 

.  j  hi  rtf  ••levvi  ms  e 

"An  iMTF.kpniT?' VI-:  ruOCriAfl  carries  out  the  instruct ‘.on  *  ' 
a  program  vri t ten  ,‘.i  one  language  by  translating  cs 
instruction  oi  that  source  language  into  .a  sequence  of 
.'orctn»ocr  Jnct«  n:  tious  in  the  language  of  the  oorrt.  .••*,• 
being  ur-vti,  and  cy  allowing  these  computer  mr>er»«.~v 
to  be  obeyed  before-  translating  the  ne;:t  metr  \v«.» . 

This  step-by-step  translation  arid  execution.  '  t  eerier 
significant  when  the  execution  of  one  instruction  causoa 
a  cnange  in  the  meaning  of  that  inctru  *tion  or  arotr.ci* 
one.  A  eev  translation  of  the  changed  instruction  vil? 


Best  Available  Copy 


oc  necessary  then  ucTore  it  can  be  correctly  executed.” 


Fourth- level  response: 

"Consider  the  following  sequence  of  instructions: 
Position  Instruction 


1  FETCH  5 

2  ADD  6 

3  STORE  5 

U  GO  TO  1 


5  ... 

6  ... 

An  7KTERHRETTVE  FROG HAM  might  first  translate  FETCH  5 
into  2Uo06  ('‘bring  into  the  accumulator  the  contents  of 
nepK'.ry  position  r")  and  execute  the  instruction.  Then 
it  r.ijiht  translate  STORE  5  into  02006  ("store  the 
contents  of  the  ?.ccur,tulator  in  memory  position  5")  and 
execute  that  Instruction,  Finally,  it  might  translate 
'.V>  TO  1  Into  32001  ( "go  to  the  instruction  located  in 
memory  position  1  and  execute  it").  The  instruction 
located  at  r^emo^y  position  1  is  FETCH  5.  Because  the 
INr^ruRETIVE  PROGRAM  has  carried  out  all  instructions 
immediately  after  translating  them,  memory  position  5 
now  contains  a  new  value  which  will  be  incorporated 
into  all  further  instructions  involving  it.  If 
translation  of  all  instructions  had  been  completed 
before  any  of  them  had  been  executed,  such  a  change 
would  have  been  ignored.  This  demonstrates  the  major 


characteristic  of  an  INTERPRETIVE  PROGRAM  —  that 
translation  of  an  instruction  is  performed  each  tints 
the  instruction  is  tc  be  obeved. 


I 
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GRAMMAR  ROUTINES 

A  restriction  is  a  series  of  routines  with  their  arguments 

which  operate  in  the  tree  or  any  of  the  listB  (grammar,  word. 

T5l 

dictionary,  sentence  lists)  .  The  restrictions  are  part  of  the 
grammar  and  therefore  determined  by  the  graranarians .  However, 
the  function  of  the  routines  in  the  analyzer  program  will  be  des- 
cribed  below.  By  means  of  these  routines  the  tree  or  list 
structure  may  be  examined  for  different  properties,  e.g.  well- 
formedness  of  substructures.  A  restriction  is  itself  represented 
in  the  machine  as  a  list.  Each  routine  in  the  restriction  is 
executed  in  order;  if  any  routine  in  the  list  fails,  the  restric¬ 
tion  fails.  When  the  restriction  is  encountered,  the  machine  is 
'looking  at'  either  a  node  or  a  word  in  a  list.  If  a  restriction 
fails,  it  always  returns  to  its  starting  point  (node  or  list  word); 
if  it  succeeds,  it  remains  Just  where  the  last  routine  exited. 

If  a  routine  fails,  it  also  returns  to  Its  starting  point  or 
leaves  the  machine  ' looking  at '  &  different  place  depending  on 
its  function.  Some  routines  must  start  at  nodes  and  others  at 
list  words.  Some  can  differentiate  between  the  two  structures 
and  those  may  start  at  either  place. 

Some  routines  (e.g.  AND)  have  the  property  of  recursiveness, 
i.e.  during  the  execution  of  this  routine,  a  call  is  again  made 
to  the  routine.  Since  FORTRAN  does  not  support  recursive  sub¬ 
routines  the  recursive  routines  are  put  into  one  subroutine  and 
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a  pushdown  stack  is  used  to  store  and  restore  the  appropriate 
'locations'  needed  for  the  proper  operation  of  these  routines. 

In  this  case,  the  locations  are  labelled  FORTRAN  statements. 

A  detailed  alphabetical  description  of  the  routines  will 
follow.  In  order  to  avoid  unnecessary  repetition  the  routines 
will  he  represented  as  functions  and  certain  symbols  will  be  used 
to  describe  various  details. 

Namely: 

1)  F(Z)  «  A  routine  that  returns  to  its  starting  point 
only  if  it  foils. 

2)  T(z)  »  A  routine  that  always  returns  to  its  starting 
point. 

3)  The  following  variables  will  describe  the  type  of 
argument  of  the  routine: 

a)  a  =  a  symbol 

b)  y  =  a  restriction  list 

c)  A  =  a  list  of  symbols  a^ag,...^ 

d)  Y  =  a  list  of  restriction  lists  y^^*  •  •  •  »yn 

e)  0  a  no  argument 

f)  X  =  grammar  register  (special  location  available  to 
the  grammarian.  It  is  used  in  a  restriction  for 
storing  and  retrieving  nodes  or  list  words.) 

4)  The  subscripts  Vj-V2  attached  to  &Tor  F  indicate  where 
the  machine  nw»+  start  {V1  position)  and  where  it  will 
end  (Vg  position). 

a)  V  ■  N  represents  a  node 
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b)  V  »  L  represents  a  liBt  word 
c'i  V  =  0  means  that  the  routine's  functioning  is 
independent  of  the  starting  (V^)  or  stopping 
(V2)  point. 


Name  of  Routine 
AND 


Type 

T0^,(Y) 


ATTRB  1.  FK.L(0) 


2.  Fl_l(0) 


Operation 

Test  that  all  y^'s  exit  +. 

The  current  node  NS  must 
be  atomic.  Therefore,  it 
corresponds  to  category  S 
of  the  word  which  matches 
NS.  Go  to  the  sUblist  of 
category  S. 

Go  to  the  sublist  of  the 
current  list  word. 


3.  F*_L(A)  Perform  1,  then  go  down  the 

sublist  until  an  a^  is 
reached. 


.  F^A)  Perform  2,  then  go  down  the 

list,  until  an  a^  is 
reached. 


If  the  operation  can  be  performed,  the  routine  is  successful 
and  exits  +;  otherwise,  the  routine  fails  and  exits  - .  If 
every  routine  in  a  restriction  y  exits  +,  then  y  itself  exits 
+;  if  any  routine  in  y  exits  -,  y  exits 
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Name  of  Routine 

BITUf*  1, 

Type 

To-o<*1**2»*3) 

Operation 

Set  (if  a^«l)  or  Reset 
(if  aj,^  l)  the  a^  bit 
of  the  halfwrd  represented 

by  the  ag  symbol. 

2. 

The  computer  is  looking 

at  a  grammar  list.  This 

list  contains  numbers 

representing  the  bitB  of 

the  ag  halfword  to  be  set 

(a^«l)  or  reset  (aj^l). 

BITT*  1. 

Wai*2> 

Test  that  the  bit  of 

fU 

the  a^  halfword  is  set. 

2. 

tul{*i) 

The  computer  is  looking 

at  a  grammar  list.  This 

list  contains  numbers 

representing  the  bits  of 

the  aj_  halfword  that  must 

set  if  routine  is  to  pass. 

CARDO 

Vo<T) 

Test  that  y  can  be  executed 

successfully. 

*  In  BITIli,  BITT,  SETT,  TSET  the  symbols  used  are  such  as  to 
yield  a  nuober  one  hundred  larger  than  desired.  The  numbers 
referred  to  in  the  description  are  the  numbers  after  100 
Is  subtracted. 


a*. 


Name  of  Boutina 


Nana  of  Rout 

IM 

Operation 

CHBCF 

WA> 

Teat  whether  the  current 

word  h&a  in  on  lti 

category  list. 

CL8SL 

'l-l(a) 

Go  to  the  place  in  the 

present  list  that  has  an 

a^  category. 

CCMfll 

1. 

Test  that  a  symbol  in 

matches  a  symbol  in  Ag. 

s«  Tc^o(y^j) 

Test  that  the  following  be 

done:  Execute  y  success¬ 
fully.  If  y  leads  to  a 

node  NS  form  a  list 

consisting  of  the  symbol 

S.  If  y  leads  to  a  list 

set  the  list  equal  to  A^« 

Go  to  1. 

3.  WAiy) 

Do  step  2  for  Ag. 

*»•  T0_0(ny2) 

Do  step  2,  then  3. 

DNRIT 

y°> 

Go  to  the  rightmost  node, 

one  level  below  the 

current  node 
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3322  Operation 

^N-R'^1^2^3)  Descend  to  a  node  Na^ 
below  the  current  node. 
Descent  In  the  following 
manner:  (l)  Set  m  «  1; 

(2)  Descend  to  an  N&^  m 
levels  belov  the  current 
node,  scanning  from  left 
to  rlgnt.  If  there  are 
none,  set  mam+1  and  go  to 
(2).  During  the  descent, 
if  any  node  In  level  a  is 
an  N&31  or  la  nontrans¬ 
parent  do  not  go  below  it, 
unless  it  is  im  Nag^. 

Exit  when  a  further 
descent  is  no  longer  possi¬ 
ble,  either  because  there 
are  no  more  transparent 
nodes  or  because  the  lowest 
node  of  the  tree  has  been 
reached. 


Co  to  the  node  directly 
below  the  current  node. 


DOW 
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Name  of  Routine 
D0WW1 


Typ« 

**-»«>) 


Operation 

Co  to  the  node  directly 
below  the  current  node. 

The  node  below  must  have 
no  node  to  Its  right 
(except  for  special  process 
nodes) . 


DSQU 


DWNTO 


Tir-If((yi*l)  Test  whether  all  y^s  can 
...  (y^))  executed  successfully; 

1.  Set  i  ■  1, 

2.  Empty  al  1  grammar 
registers. 

3.  Execute  yA  successfully 
.  .  and  return  tc  the 

starting  point. 

Set  i  =»  i+l,  go  to  2. 

1*  W*)  Go  to  a  node  Na^  below  the 

current  node  using  the  same 
manner  of  descent  as  DNTRN 
except  that  all  nodes  are 
treated  u  if  they  were 
transparent. 


2* 


Go  to  the  place  in  the  pre¬ 
sent  list  that  has  an  a^ 
category. 
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Ha-eofSoufclne  ^CE2  Operation 

?L-lM  The  machine  must  be 

'looking  at'  the  firet 
option  of  a  list  L  of 
options.  Generate  a  lie! 
L'  of  options  frcw  L  in 
the  folio -ring  oanr.ei': 

’  ‘  ■?..  Set  i  «  1. 

2.  Look  at  the  ith  voru 
of  L  (which  points  to 
the  ith  option  of 

,  I)- 

?•  Test  whether  y  can  be 
executed  successfully. 
If  no.  py-  the  ith 
worl  ox’  L  in  L*  nnd  go 
to  J«;  }.i'  not,  go  to  4, 

4,  if  there  is  .mother 
option  in  L  set 

i  -  i+1  and  g s  to  (2); 
if  not,  go  to  5. 

5,  Look  at  L’.  It  awit 
have  at  least  one 
optics: . 


Miilil. 
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Hama  of 
EMPTY 


EXEC 


EXPOT 

FETCH 


Boutina 


2522 

W<» 


1-  Tn.i(°) 


2.  rg„0(iff) 


Fo-o(y) 

Fo-ir(o) 


Operation 

Test  whether  the  current 
node  is  empty.  I.e.  that 
no  node  in  its  svfc structure 
corresponds  to  a  word  of 
the  sentence. 

NS  is  the  current  node; 

^il  is  ^>«lov  it.  TjOC'u 
k*.  the  (5.+l)-st  word  of 
the  string  8. 

Perform  1,  then  got  the 
restriction  R±  and  find 
the  routine  RT  on  Ri#  if 
RT  is  found,  execute  its 
a.'gmant}  if  RT  Is  not 
found,  fled  the  routine 
OLIST  and  execute  its 
argument . 

Execute  y. 

Used  in  conjunction  with 
KILLIN'.  Go  to  the  node  on 
top  of  the  STAG  pushdown 
produced  by  BTLLIN.  If 
STAC  is  mxjty,  FETCH  fail*. 


•i-fr- a “litlfrk  ■*  &as 


Kama  of 
TILL  Of 


FI  HD 


Routine 


222 


1-  ^l(rt) 


2«  F*_L(RT) 


Operation 

For  each  aj^  In  the  sub¬ 
structure  of  the  current 
node,  place  Its  location 
in  the  STAC  pushdown.  If 
there  are  no  such  nodes, 
FILL1N  fails. 

The  present  list  should  be 
a  restriction  list.  Go  to 
the  place  in  the  present 
list  that  has  the  routine 
RT.  Look  at  the  argument 
of  RT. 

N53  is  the  current  node. 

is  the  node  below  it. 
Look  at  the  restriction 
list  on  option  S^. 

Go  to  1. 


3*  *l_i,(RT(A))  a)  Perform  1.  If  it  is 

successful,  go  to  b; 
otherwise,  exit  -  . 
b)  Check  whether  there  is 
an  a^  on  the  argument  list 
of  RT,  If  there  is,  exit 
+;  otherwise,  go  to  3a)  to 


Wane  of  Routine 


C-ll 


FIND 


FRSTL 


GKKER 


(con't.) 


Type 


Operation 

find  ^he  next  RT  on  the 
restriction  llet. 


Perform  2.  Then  go  to 

3b). 


Go  to  the  place  in  the 
sentence  list  corres¬ 
ponding  to  the  word  that 
was  current  Just  before 
the  current  node  was 
constructed. 


The  current  node  must  be 
a  special  process  node. 

At  least  one  node  N8^ 
must  be  to  the  left  of  it. 
Generate  an  option  list  T, 
so  that: 

T1  “  sik 

t2  a  slk-l  &nd  8iX 

Tk  -  Sn  and  S12  and  . . .  sik 


Look  at  T 
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Name  of  Routine 

Type 

Operation 

IMPLY 

T©-oton7a> 

If  Yi  exits  IMPLY  exits 

+;  if  y^  exits  +,  then 

IMPLY  succeeds  only  if  y2 

exits  +. 

INTOL  1. 

W°> 

is  the  current  node. 

Look  at  the  Jth  word  of 

the  option  8^. 

2. 

fL.L(0) 

iThe  current  list  word  is 

pointing  to  another  list. 

Go  to  that  list. 

iarr 


1*  tn-n(a^ 


I 8  the  current  node  an  Na^ 


2.  a)  If  the  current  list 

word  is  an  option:  is  the 
first  element  In  the  option 
an  a^? 

b)  If  the  current  list 
wrd  is  an  element  of  an 
option:  is  the  element 
an  a^? 

c)  If  the  current  list 
word  is  a  symbol:  is  the 
symbol  an  a^? 


Name  of  Routine 


Tvoe 


2*  fo-o ]72)  Execute  y1  successfully  in 

the  following  manner: 

a)  Execute  yL.  If  it  is 
successful,  ITER  exits  +, 
If  it  is  not  successful, 
go  to  b) . 

b)  Execute  y2.  If  it  is 
successful,  go  to  a).  If 
it  is  not,  ITER  fails. 


NS  is  the  current  node. 

Go  to  the  place  in  the 
sentence  list  corresponding 
to  the  word  that  was 
current  when  NS  was  com¬ 
pleted. 


LEFT 


*N-n(°)  00  left  until  the  first 

node  which  is  not  a 
special  process  node  is 


reached 


I 
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frame  of  Routine 
LOOKT 


IH2S 


MARK 


! 


VoW 


Operation 

Go  to  whatever  is  stored 
in  X.  If  X  is  empty, 
exit  - . 

MARK  must  be  retrieved  and 
its  argument  A  looked  at 
by  FIND.  This  enables  the 
options  of  a  string  defini¬ 
tion  to  be  assigned 
properties  which  may  be 
tested  by  other  parts  of 
the  grammar.  There  may 
be  several  MARK  routines 
(for  different  types  of 
property  lists)  in  one 
restriction,  in  that  case 
A  must  contain  a  special 
symbol  to  identify  which 
type  of  list  it  is.  For 
example,  if  the  symbol  P 
identifies  a  list  of  pre¬ 
positions  that  list  may 
be  obtained  by  using  the 
following  command: 
raD(MARK(P)). 
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Name  of 
NEXT  AT 


NEXTL 


HOATM 

NOT 

HOTEL 

ORR 


touting 

1. 


2ffiS 


2*  rN-If(°) 

i*  ful(o) 
2* 


tn-n(°) 


Operation 

Starting  at  atomic  nod*. 
Go  to  next  atonic  node 
on  tree. 

Starting  at  non- atomic 
node.  Go  to  next  atonic 
node  on  tree. 

Go  to  the  next  list  word. 

NSj^  is  the  current  node. 
Go  to  the  (j+l)-st  word 
of  option  Slt 

Is  the  current  node 
non-atoulc? 


^o-o(y)  If  y  exits  +,  HOT  exits 

if  y  exits  HOT  exits  +. 

fo-l(a)  Look  at  list  A. 

VoM  Th®  >**'■  are  executed 

successively  starting  with 
.  As  soon  as  any  y^ 
exits  +„  ORR  exits  +.  if 
no  y±  exits  +,  ORR  exits 
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Naae  of  Rout In t 
ORPTH 


Type 

'o-eM 


FUSS 


To-o«» 


FUCS*  1.  Tk.|,(0) 


3*  TL-l/*l) 


Operation 

The  execution  is  identical 
to  ORR.  On  completion, 
ORPTH  remains  where  the 
successful  y^  has  brought 
it. 

Was  a  parse  obtained  for 
this  sentence? 

Place  EBCDIC  representation 
of  index  term  for  current 
node  into  ARGMT  buffer. 

The  symbol  1b  not  the 
500th  element  of  symbol 
table.  Place  this  number 
into  AROMT  buffer. 

Tne  symbol  a^  is  the  SOOth 
element  of  symbol  table. 
Place  the  value  of  the 
current  list  word  into 
ARGMT  buffer. 


*  The  symbols  ai,  ag  are  such  that  have  the  symbol  table  location 
of  the  desired  number. 


Wang  i 
PLACE 


FREVL 


RIGHT 


SCOPE 
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f  Routing 
(con't.)  4. 


2m 


'l-l(o) 

'*.»«» 

.  W®) 


Operation 

The  aynbol  a^  is  the  500th 
element  of  symbol  table. 
Place  the  value  of  the  ag 
halfword  into  AROMT. 

Go  to  the  previous  list 
word. 

Go  right  until  the  first 
node  which  is  not  a  special 
process  node  is  reached. 

The  current  node  must  be 
a  special  process  node. 

The  node  NSik  would  have 
been  attached  if  the 
special  process  mechanism 
had  not  been  interrupted. 
Generate  an  option  list  T 
so  that: 


T2  **  slk  mx*  5i(k+l) 

Vk  *sikandsi(k-M)  Md 

. . .  and  Sin 


Look  at  T 
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Raae  of  Routine 

SSRTL 

Type 

po-l«» 

Operation 

Go  to  the  place  in  the 

sentence  list  corres¬ 
ponding  to  the  first  word 

of  the  sentence. 

SETT* 

1*  To«o^1»*2^ 

Set  the  a^  halfword  to 

the  value  of  eg. 

*•  Wl> 

Set  the  a^  halfword  to 

the  value  of  the  current 

list  word. 

8PGFT 

F0-t(8) 

Set  up  a  list  T  (of 

options)  composed  of  one 

option  T^.  ia  composed 

of  one  element  S.  Look 

at  T. 

8P8CT 

(yn»n)) 

This  routine  must  be  on 

the  restriction  on  8^ 

and  it  always  exits  +.  It 

will  either  find  a  substi¬ 
tute  set  of  options  for  8, 

or  leave  S  unchanged. 

*  In  BITIR,  BITT,  SETT,  TSET,  the  synbols  used  are  such  aa  to 
produce  a  number  one  hundred  larger  than  desired.  The  numbers 
referred  to  in  the  description  are  the  numbers  after  100  la 
subtracted. 


Name  of  Routine 
8FSCF  (coo't.) 


xm 


STORE 


W*> 


SUBJR 


vW-0^) 


Operation 

1.  Set  1-1. 

2.  Empty  all  grannar 
registers. 

3.  Execute  yi# 

If  y"i  1#  successful, 
the  machine  Is  now  'looking 
at'  a  substitute  set  of 
options.  Ignore  the  re¬ 
naming  routines  on  Rx 
and  return  to  PARSER  with 
a  'substitution'  sigpial. 

5.  If  y^  is  not  success¬ 
ful,  set  i  •  i+i  and  go 
to  2. 

Store  the  address  of  the 
current  node  or  list  word 
in  X. 

SUBJR  must  be  retrieved 
and  y  executed  by  EXEC, 
y  should  be  the  path  to 
the  subject. 


TRIE 


W°> 


Always  exit  ♦. 
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Raae  of  Routine 

Tjrge 

Operation 

TSET* 

Test  that  the  a^  halfword 

has  the  value  &g  • 

UPON* 

*W°> 

Go  to  the  parent  node  of 

the  current  node. 

UPTO 

^N-n(A) 

Go  to  an  Nau  above  the 

current  node. 

IFTRN 

**.*^*2*3) 

Go  to  an  Na^  above  the 

current  node;  however,  do 

not  go  above  an  Na^  or  e 

non-transparent  node 

unless  it  is  an  Na^. 

VERBR 

^o-o  (y) 

This  is  a  non-executable 

routine.  It  must  be 

found  and  y  executed  by 

EXEC,  y  is  usually  the 

path  to  che  verb. 

WELLE 

v»-o<(yi»i) 

WELLF  must  be' retrieved 

and  ite  argument  y 

executed  by  PARSER  after 

NS  is  complete.  Its 

*  In  BITIN,  BITT,  SETT,  T8ET,  the  symbol*  used  are  such  as  to 
produce  a  manber  one  hundred  larger  than  desired.  The  lumbers 
referred  to  la  the  description  are  the  nuabers  after  100  is 
subtracted. 
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Name  of  Routine  Type 

WELLF  (con't.) 


WORDL 


Operation 

argument  is  executed  In 
the  some  manner  as  that 

of  DSQLF. 

Go  to  the  place  in  the 
sentence  list  corres¬ 
ponding  to  the  current 
vord  V. 


APPENDIX  D 

PRAOiATIC  CATEGORIES 


APPENDIX  D 


PRAGMATIC  CATEGORIES 

n4ia  -  indicates  general  information 

data,  stuff,  material,  information 
IthlB  -  indicates  definite  article-type 

article(s),  boo)c(e),  vork(s),  number (s),  paper(s), 
document(s),  publication  s) 

IlUs  .  indicates  bibliographic  noun  (used  to  set  bibliographic 
comnand) 

date(s),  author(s),  year ( a ) ,  editor(s),  Issuer(s), 
title(s),  writer(s),  abstract (a),  co-autbor(a) , 
publisher(s),  description 
NU3  -  indicates  ward  sequence  type 
vord(s) ,  phrase(s) 

Nh5  -  Indicates  expansion  noun 

example(s),  illustration(s) 

NU6  -  indicates  system  branch  noun 

NU6A  -  file(s),  system(s),  library 
NU6B  -  thesaurus,  lexicon 
N46C  -  dictionary 

Nh?  -  subclass  of  Nhl  that  may  have  accession  number  as  a 
right  adjunct 

article(s),  number(s),  docusent(s),  paper(s) 

N46  -  indicates  definition 
definition,  meaning 

D-l 


t 


n-2 


HU9  -  indicates  synonym 
synonym(s) 

*03  -  indicates  a  possessive  index  tern 
Smith's,  Jones1 

BN  *  indicates  sector  designator  noun 

date(a),  author(s),  year(s),  editor(s),  issuer(s), 
title(s),  vriter(a),  co-author(s),  pdblisher(s) 

BVC  »  bibliographic  verbs 

authored,  co-authored,  appeared,  deal,  dated,  deals, 
dealt,  dealing,  edited,  issued,  listed,  appearing, 
dealing,  produced,  written,  pertaining,  published 
HVC  -  relational  verbs 

relate,  related,  relates,  relating 
TVC  «  thesaurus  verbs 

begin(s),  beginning,  start(s),  starting 
8C  -  uet-connand  verbs 

written,  authored,  co-authored,  published,  issued, 
edited,  produced,  write,  edit,  produce,  author,  Issue, 
publish,  mean 

BV  -  sets  bibliographic  consnand  verbs 

wrote,  edited,  produced,  published,  issued,  written 
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804 -VALUES 


APPENDIX  5 
8B4- VALUES 


Value 

Word 

1 

written,  authored,  co- authored 

2 

edited 

3 

published,  issued 

4 

dated 

5 

concerned 

6 

entitled 

9 

characterised,  indexed 

20 

in 

21 

on 

22 

by 

23 

24 

from 

25 

between 

26 

during 

27 

before 

28 

after 

29 

under 

30 

about 

31 

since 

32 

earlier 

33 

around 

B-2 


Vtltti 

Word 

34 

to 

35 

of 

38 

u 

4o 

prior 

51 

> 

52 

and 

53 

or 

54 

but 

55 

either 

56 

neither 

57 

both 

58 

as 

59 

well 

6o 

not 

61 

only 

62 

also 

63 

nor 

71 

dealing,  pertaining 

72 

appearing 

73 

concerning,  regarding, 

describing,  corering 

74 

haring 

*-3 


Value 

Word 

81 

fll«i  library,  system 

82 

that 

83 

teletype,  printer 

91 

author(a),  co-author(e),  eriter(e) 

92 

year(a),  date(s) 

93 

title(s) 

94 

editor(s) 

96 

issuer(a),  published  a) 

96 

field(a),  area(e),  topic(a). 

aubject(a) 

97 

vord(a) 

98 

publicatlona,  Journal(a) 

99 

interval,  period 

101 

written,  wrote,  write,  authored. 

co-authored 

ICC 

edited,  edit 

103 

published,  produced,  iaaued,  publish 

iaaue,  produce,  co-author 

Ji.l. 
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SAMPLE  QUERIES 

1.  Give  me  everything  written,  edited,  or  published  by  Jones. 

NIMBER  (((  AUTH  JONES  )  +  (  EDIT  JOKES  ))  +  (  ISSR 
JONES  ))  ** 

2.  What  Is  generic  to  radar? 

RELATION  (8)  RADAR  ** 

3.  Do  you  have  something  about  radar? 

NUMBER  DESC  RADAR  ** 
k.  Give  me  some  synonyms  of  automobile. 

SYN  AUTOMOBILE  ** 

5.  What  does  radar  mean? 

DEFINE  RADAR  ** 

6a.  I  want  anything  by  Jones. 

NUMBER  AUTH  JONES  ** 

6b.  Hov  about  Allen. 

NUMBER  AUTH  ALLEN  ** 

7.  I  want  something  related  to  radar. 

RELATION  RADAR  ** 

8.  What  Is  radar? 

DEFINE  RADAR  ** 

9.  Give  me  anything  in  the  thesaurus  starting  with  ABB. 

THES/x  ABS  ** 

10.  Could  I  have  data  concerning  the  theory  of  salt  with  sugar? 
NUMBER  DESC  THEORY  SALT  SUGAR  ** 

F-l 


11.  Mve  me  anything  on  radar  and  anything  on  sonar. 

NUMBER  DFjC  RADAR  ** 

12.  Give  me  everything  on  either  sonar  or  laser. 

NUMBER  DESC  (  SONAR  )  +  (  LASER  )  « 

13.  What  books  do  you  have  on  radarT 

NUMBER  DESC  RADAR  ** 

1’*.  What  do  you  have  on  radar? 

NUMBER  DESC  RADAR  ** 

15.  Who  has  written  anything  on  radar? 

NUMBER  DESC  RADAR  **  AUTH  ** 

16.  Define  radar. 

DEFINE  RADAR  ** 

17.  look  up  radar  in  the  dictionary. 

DEFINE  RADAR  ** 

18.  I  want  radar  defined. 

DEFINE  RADAR  ** 

19.  What  has  Jones  written  on  radar7 

NUMBER  ((  AUTH  JONES  ))  *  (  DESC  RADAR  )  ** 

20.  Give  me  the  author  of  document  110. 

FORM  110  **  AUTH  ** 

21.  What  have  Jones  and  Allen  written  about  radar? 

NUMBER  ((  AUTH  (  JONES  )  A  (  ALLEN  )))  A  (  DESC 
RADAR  )  ** 

22.  What  baa  Jonas  written? 

NUMBER  (  AUTH  JONES  )  « 
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23.  What  do  you  have  written  by  Jones? 

NUMBER  AUTH  JONES  ** 

24.  I  want  anything  related  to  wave  propagation  and  time 
dependent  transforms. 

RELATION  WAVE  PROPAGATION,  TIME  DEPENDENT  TRANSFORMS  ** 

25.  I  want  the  author  and  date  of  publication  of  documents 

110,  120,  130. 

FORM  110  ,  120  ,  130  **  AUTH  **  DATE  ** 

26.  I  want  the  author  and  date  of  documents  no,  120,  130. 

FORM  no  ,  120  ,  130  **  AlflH  **  DATE  ** 

27.  Synonym s  of  radar, 

SYN  RADAR  ** 

28a.  Documents  by  Jones. 

NUMBER  AUTH  JONES  ** 

28b.  And  Anen. 

NUMBER  AUTH  ALLEN  ** 

28c.  Smith 

NUMBER  AUTH  SMITH  ** 

29*  Give  me  the  author,  title  and  issuer  of  something  pertaining 
to  radar. 

NUMBER  DESC  RADAR  **  AlfTH  **  TITL  **  IS8R  ** 

30.  I  want  the  bibliographic  information  of  docvsnent  130. 

FORM  130  **  DESC/BIBLIO  ** 

31.  What  could  I  have  written  by  Jones? 


NUMBER  AUTH  JONES  ** 


T-k 


32.  Please  give  me  all  of  the  documents  written  after  I960 
about  radar. 

NIMBBR  (  DATE  I960  +  1961  +  1962  1953  +  1964  + 

1965  +  1966  +  1967  ♦  1968  ♦  1969  +  I960  ♦ 

1961  1962  +  1963  +  1964  +  1965  +  1966  ♦ 

1967  +  1968  +  1969  )  *  (  DISC  RADAR  )  ** 

33.  Give  me  everything  between  AB  and  AZ  In  the  thesaurus. 

THES/BW  AB,  AZ  ** 

34.  Olve  me  anything  written  in  I960  by  either  Alan  or  Sal  the. 

NUMBER  (  DATE  I960  )  A  (  AUIH  (  ALAN  )  +  (  SMITHS  ))  ** 

35.  I  want  anything  by  two  of  the  following  authors  1  Greene, 
Holden,  Allen,  Wills. 

COMBINE  (2)  AUTH  GREENE  /  M3LDEN  /  ALLEN  /  WILLS  «* 

36.  I  want  all  the  stuff  Jones  has  vrlttsa. 

NUMBER  (  AUTH  JONES  )  ** 

37*  Anything  by  more  than  two  but  less  than  four  of  the 
following  terms:  radar,  sonar,  laser,  maser,  pacer. 

COMBINE  (G2AL4)  DE9C  RADAR  /  SONAR  /  LASER  / 

MASER  /  PACER  *» 

38.  What  ie  the  definition  of  radar,  sonar,  and  lassrt 

DEPIKE  RADAR,  SONAR,  LASER  *» 

39.  What  has  Jones  written,  edited  or  published  about  radarT 

NUMBER  ((  AUTH  JONES  )  +  ((  EDIT  JONES  )  ♦  ((  I8SR 
JONES  ))))  A  (  DE8C  RADAR  )  «* 

40.  Give  me  any  word  around  ST  in  the  thesaurus. 

THM/AR  ST  ** 


F-5 

4l.  I  want  all  that  you  have  on  radar,  aonar  and.  laser. 

NUMBER  DESC  (  RADAR  A  SONAR  )  A  (  LASER  )  ** 

1*£.  Give  me  anything  radar  la  generic  to* 

RELATION  (7)  RADAR  ** 

43.  What  did  Jones  edit  about  radar? 

NUMBER  ((  EDIT  JONES  ))  A  (  DESC  RADAR  )  ** 

44.  By  not  less  than  two  of  the  following  authors:  Hopsy, 
Wilson,  Pett,  Robbln,  Cyde. 

COMBINE  (GE2)  AUTH  HOPSY  /  WILSON  /  PETT  / 

ROBBIN  /  CYDE  ** 

45.  Indexed  by  more  than  two  of  the  following  terms: 
radar,  sonar,  laser,  pacer. 

COMBINE  (Gc)  DESC  RADAR,  SONAR,  LASER,  PACER  ** 

46.  By  two  or  three  of  the  following:  AB,  CD,  EF, 

COMBINE  (203)  DESC  AB  /  CD  /  EF  ** 

47*  Generic  to  radar,  nonar  and  laser. 

RELATION  (8)  RADAR,  SONAR,  LASER  ** 

46.  I  want  all  the  words  that  radar,  sonar  and  laser  are 
generic  to. 

RELATION  (7)  RADAR,  SONAR,  LASER  ** 


UNCLASSIFIED 


DOCUMENT  CONTROL  DATA  -RAD 

, _  -.Stoufl/j  date  Ihctlloti  Of  f/t/t,  body  of  abmlrm  cf  mnrf  nd+x:nc  annotation  mu»i  6*  enfcrtJ  »*-San  fha  ov»r#iJ  rep,. 


ORlGi^Td^  ACT1  VI T  Yy’jf  0jp£r*f®  cuffi©*; 

Thr  Min  (•  Sfh'X’i  of  E  i’;  E  \  i 

Phi  1  idelphj  ,  Penney  Iv  oil  . 


1.  REPORT  TITLE 


UAL  ENGLISH  t  A  TRANSLATOR  TO  ENABLE  NATURAL  LANGUAGE  MAM-MACHIHE  CONVERSATION 


4.  OSSCRlPTJVE  NOTES  (Typa  ot  report  and  inclusive  dale#; 

Scientific  Inter!* 


9.  AUTHORISI  (Ftft  rtmmm,  middle  initial,  law)  name) 


limy  Civtli 


».  REPORT  DATE 

Nay  1969 


»•.  CONTRACT  OR  GRANT  NO. 

6.  PROJECT  NO.  97M»OI 

61102F 
*'  681304 


10.  DISTRIBUTION  STATEMENT  •  —  _  „  .  . 

..  ,.]•  «°e«ot  haa  been  approved  for  public  release  and 

lta  distribution  la  uni  1st tad. 


LltVLJ 


irui 


b.  NO.  OF  «£•=■$ 


9m.  ORlGlNATOfl'5  REPORT  NUMdCRIS 


9ft.  OTHER  REPORT  NOl$)  (Any  othar  number*  that  may  bm  at«i ^nea 
ihia  import) 


fwmmtvKim  isn  w 


11.  SUPPLEMENTARY  NOTES 

TEC*  OTHER 


12.  .SPONSORING  MILITARY  ACTIVITY 

Air  Fore*  Office  of  Scientific  Research 
Directorate  of  Tnfornatloti  Sciencee(8RI) 
Arlington,  Virginia  22209 


'VnHiT  Wiser  tatlon  praaante  a  prapnatic  intorprator/tranalator  callad  Real  Engllah 
to  on,  ,,  «  natural  language  mao-wachine  co—imAaSQAon  interface  in  a  nultl-aiode 
oo-lina  Information  retrieval  ayates,  Xliia  nultlmode  feature  affords  the  user  a 

ky4»lrln«  hl“  ««"  *®  «  dictionary,  lexicen,  theaaurue. 
P™**",^}**  *"*  eU”lflc*t*«°  ***>*••  expressing  binary  relatione  aa  wall  aa  the 
r  **•  fuu  °*  discourse.  The  user  la  thereby  allowed  a 

*** *»«»!•»*  win  1)  •  yn tactically  .nelya.  th. 
uaar  a  aasaaca  by  naans  of  a  at  ring  aaalyaia  greener  ta  produce  a  tree  rep  re  cantina 
interrelationships  of  tha  grama  deal  antitiaa  ceaprlsing  the  maaaga,  2)  uaa 

\r:frle  *r-r  *•  *• •« •* 

14  f  3*  far®  the  proper  syntax  for  each  rr—a nil.  The  strong 

fictile  fWTUtl*n  ?£  ®yat48  *°*1’rMr  ®"dewe  tbo  ays  too  with  tha  power  of 

1  **T' rl#"*t  *hT  th*t  *®rt*ln  •tructuraa  occur  and  abouU  there- 

f«ra  be  a  part  of  tha  ayataw,  they  way  be  incorporated  iota  the  aye  taw  by  tha  araanarli 
rJr  OV*rh*Ul  o£  Proe*4urea  *®  data.  Tba  uaar  la  pemitXd  to  phrase 
!!? f0t*  (1**‘  inporatioaf  interregetlve.  or 

!  t®  *•  eoo  araatieaally  dependant  aontencca) .  Thua 

piecing  the  uaar  in  tha  difficult  position  of  looming  a  no*  language, 
tho  ayatsn  is  given  tha  roapanalbility  of  roapoodlag  in  end  to  the  veer's  ui£uage. 
i.®.  tba  nan-mahlna  convaraatlon  in  carried  out  in  a  natural  language 


